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Abstract 

Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. 
This is useful in several applications where the input data consists of an incomplete set of distances, 
and the output is a set of points in Euclidean space that realizes the given distances. We survey some 
of the theory of Euclidean distance geometry and some of the most important applications: molecular 
conformation, localization of sensor networks and statics. 
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1 Introduction 



In 1928, Menger gave a characterization of several geometric concepts (e.g. congruence, set convexity) 
in terms of distances [151j . The results found by Menger, and eventually completed and presented by 
Blumenthal |28| . originated a body of knowledge which goes under the name of Distance Geometry (DG). 
This survey paper is concerned with what we believe to be the fundamental problem in DG: 



Distance Geometry Problem (DGP). Given an integer K > and a simple undirected 
graph G = (V, E) whose edges arc weighted by a nonnegative function d : E ^ R+, determine 
whether there is a function x -.V ISJ^ such that: 

^{u,v]eE \\x{u)-x{v)\\^d{{u,v}). (1) 



Throughout this survey, we shall write x{v) as x^ and as duv or d{u,v); moreover, norms || • || 

will be Euclidean unless marked otherwise (see |57j for an account of existing distances). 

Given the vast extent of this field, we make no claim nor attempt to exhaustiveness. This survey is 
intended to give the reader an idea of what we believe to be the most important concepts of DG, keeping 
in mind our own particular application-oriented slant (i.e. molecular conformation). 

The function x satisfying ([I} is also called a realization of G in M.^ . If iJ is a subgraph of G and x is 
a realization of H, then x is a partial realization of G. If G is a given graph, then we sometimes indicate 
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its vertex set by V{G) and its edge set by E{G). 

Wc remark that, for Blumenthal, the fundamental problem of DG was what he called the "subset 
problem" Ch. IV §36, p. 91], i.e. finding necessary and sufficient conditions to decide whether a given 
matrix is a distance matrix (see Sect. [TTTT3)) . Specifically, for Euclidean distances, necessary conditions 
were (implicitly) found by Cayley [39] , who proved that five points in , four points on a plane and three 
points on a line will have zero Cayley-Menger determinant (see Sect. [2]). Some sufficient conditions were 
found by Menger [152] . who proved that it suffices to verify that all (if + 3) x (A' + 3) square submatrices 
of the given matrix are distance matrices (see |28l Thm. 38.1]; other necessary and sufficient conditions 
are given in Thm. [^7T|) . The most prominent difference is that a distance matrix essentially represents 
a complete weighted graph, whereas the DGP does not impose any structure on G. The first explicit 
mention wc found of the DGP as defined above dates 1978: 

The positioning problem arises when it is necessary to locate a set of geographically distributed 
objects using measurements of the distances between some object pairs. (Ycmini, [232| ) 

The explicit mention that only some object pairs have known distance makes the crucial transition from 
classical DG lore to the DGP. In the year following his 1978 paper, Yemini wrote another paper on the 
computational complexity of some problems in graph rigidity |233j . which introduced the position-location 
problem as the problem of determining the coordinates of a set of objects in space from a sparse set of 
distances. This was in contrast with typical structural rigidity results of the time, whose main focus was 
the determination of the rigidity of given frameworks (see |223j and references therein). Meanwhile, Saxe 
had published a paper in the same year |188| where the DGP was introduced as the K -embeddability 
problem and shown to be strongly NP-complete when K = 1 and strongly NP-hard for general K > 1. 

The interest of the DGP resides in the wealth of its applications (molecular conformation, wireless 
sensor networks, statics, data visualization and robotics among others), as well as in the beauty of the 
related mathematical theory. Our exposition will take the standpoint of a specific application which we 
have studied for a number of years, namely the determination of protein structure using Nuclear Magnetic 
Resonance (NMR) data. Two of the pioneers in this application of DG arc Crippen and Havel [5D]. A 
discussion about the relationship between DG and real-world problems in computational chemistry is 
presented in [i^ . 

NMR data is usually presented in current DG literature as consisting of a graph whose edges are 
weighted with intervals, which represent distance measurements with errors. This, however, is already 
the result of data manipulation carried out by the NMR specialists. The actual situation is more complex: 
the NMR machinery outputs some frequency readings for distance values related to pairs of atom types. 
Formally, one could imagine the NMR machinery as a black box whose input is a set of distinct atom 
type pairs {a, b} (e.g. {H, H}, {C, H} and so on), and whose output is a set of triplets ({a, b}, d, q). Their 
meaning is that q pairs of atoms of type a, b were observed to have (interval) distance d within the 
molecule being analysed. The chemical knowledge about a protein also includes other information, such 
as covalent bond and angles, certain torsion angles, and so on (see |189j for definitions of these chemical 
terms). Armed with this knowledge, NMR specialists are able to output an interval weighted graph which 
represents the molecule with a subset of its uncertain distances (this process, however, often yields errors, 
so that a certain percentage of interval distances might be outright wrong [17]). The problem of finding 
a protein structure given all practically available information about the protein is not formally defined, 
but we name it anyway, as the Protein Structure from Raw Data (PSRD) for future reference. 
Several DGP variants discussed in this survey are abstract models for the PSRD. 

The rest of this paper is organized as follows. Sect . 1 1 . ll introduces the mathematical notation and basic 
definitions. Sect. 11.2111.31 present a taxonomy of problems in DG, which we hope will be useful in order 
for the reader not to get lost in the scores of acronyms we use. Sect. [2] presents the main fundamental 
mathematical results in DG. Sect. [3] discusses applications to molecular conformation, with a special focus 
to proteins. Sect. S] surveys engineering applications of DG: mainly wireless sensor networks and statics, 
with some notes on data visualization and robotics. 
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1.1 Notation and definitions 

In this section, we give a list of the basic mathematical definitions employed in this paper. We focus on 
graphs, orders, matrices, realizations and rigidity. 

1.1.1 Graphs 

The main objects being studied in this survey are weighted graphs. Most of the definitions below can be 
found on any standard textbook on graph theory |58j . We remark that we only employ graph theoretical 
notions to define paths (most definitions of paths involve an order on the vertices). 

1. A simple undirected graph G is a couple {V,E) where V is the set of vertices and £^ is a set of 
unordered pairs {u,v} of vertices, called edges. For U CV, we let E[U] = {{u,v} G E \ u,v <E U} 
be the set of edges induced by U. 

2. H = ([/, F) is a subgraph oi G ii U <Z V and F C E[U]. The subgraph iJ of G is induced by U 
(denoted H = G[U]) if F = E[U]. 

3. A graph G ~ {V, E) is complete (or a clique on V) ii E ^ {{w, v} \ u,v G V /\ u ^ v}. 

4. Given a graph G = {V^E) and a vertex v G V, we let Ng{v) = {u G V \ {u,v} e E} be the 
neighbourhood of v and Sciv) = {{u, w} E E \ u ~ v} he the star of v in G. If no ambiguity arises, 
we simply write N(v) and S{v). 

5. We extend Nq and Sg to subsets of vertices: given a graph G = (F, E) and U C V, we let 
Ng{U) = U«GC/ ^civ) be the neighbourhood of U and Sg{U) ~ Utieu ^g{v) be the cutset induced 
by [/ in G. A cutset S{U) is proper iiU ^ and ?7 7^ y. If no ambiguity arises, we write N{U) 
and (5([/). 

6. A graph G = {V, E) is connected if no proper cutset is empty. 

7. Given a graph G = (V, i?) and s, t G V^, a simple path H with endpoints s, t is a connected subgraph 
H = iV',E') of G such that s,t G V, \Nh{s)\ = \NH{t)\ = 1, and \Nh{v)\ = 2 for all w G F'\{s,t}. 

8. A graph G = (V, E) is a simple cycle if it is connected and for all w G F we have |Af(w)| = 2. 

9. Given a simple cycle G = (V , E') in a graph G = (V,E), a chord of G in G is a pair {u, v} such 
that u,v eU and {w, w} G i? \ £". 

10. A graph G = (y, i?) is chordal if every simple cycle G = (F', £") with \E'\ > 3 has a chord. 

11. Given a graph G = (y,-B), {u, i>} e E a.nd z <^ V, the graph G' = {V',E') such that y' = 
{V U {z}) \ {w, v} and £" = (i? U {{w, 0} | u; G A^g(u) U AfG(^^)}) \ v}} is the erfge contraction 
of G w.r.t. {u, w}. 

12. Given a graph G = {V,E), a, minor of G is any graph obtained from G by repeated edge contraction, 
edge deletion and vertex deletion operations. 

13. Unless otherwise specified, we let n = |y| and m = \E\. 

1.1.2 Orders 

Algorithms for realizing graphs in Euclidean spaces are often iterative on the graph vertices, and therefore 
require (or define) a vertex order. The names of the orders listed below refer to acronyms that indicate 
the problems they originate from; the acronyms themselves will be explained in Sect. 11.21 Orders are 
defined with respect to a graph and sometimes an integer (which will turn out to be the dimension of the 
embedding space). 
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1. For any positive integer p G N, wc let [p] = {1, . . . ,p}. 

2. For a set V, a total order < on V, and v € V, we let 7(u) = {u G V \ u < v} he the set of predecessors 
of V w.r.t. <, and let p(v) ~ \^iv)\ + 1 be the rank of u in <. We also define ri{v) ^ {u G V \ v < u} 
to be the set of successors of v w.r.t. <. 

3. The notation N{v) fl 7(w) indicates the set of adjacent predecessors of a vertex v; N{v) D ii{v) 
indicates the set of adjacent successors of v. 

4. It is easy to show that if G = (V, E) is a simple path then there is an order < on V such that for 
all {u, v} G E we have p{u) = p{v) — 1, and that the vertices of minimum and maximum rank in < 
are the endpoints of the path. 

5. A perfect elimination order (PEO) on G = (V, E) is an order on V such that, for each v G V ^ 
G[N{v) n rj{v)] is a clique in G. 

6. A DVOP order on G = {V^E) w.r.t. an integer K G [n] is an order on V where (a) the first K 
vertices induce a clique in G and (b) each v gV rank p{v) > K has \N{v) n 7(f )| > K . 

7. A Henneberg type I order is a DVOP order where each v with p{v) > K has \N{v) H 7(w)| = K. 

8. A K -trilateration (or K -trilaterative) order is a DVOP order where (a) the first K + 1 vertices 
induce a clique in G and (b) each v with p{v) > K + 1 has \N{v) D 7(u)| > A' + 1. 

9. A DDGP order is a DVOP order where for each v with p{v) > K there exists Uy C N{v) n 7(f) 
with |t/u| = K and G[?7t,] a clique in G. 

10. A ^DMDGP order is a DVOP order where, for each v with p{v) > K . there exists Uy C N{v) r\^{v) 
with (a) |C/„| = (b) G[C/„] a clique in G, (c) Vu e C/„ - A' - 1 < p{u) < p{v) - 1). 

Directly from the definitions, it is clear that: 

• ''DMDGP orders are also DDGP orders; 

• DDGP, AT-trilateration and Henneberg type I orders are also DVOP orders; 

• '^DMDGP orders on graphs with a minimal number of edges are inverse PEOs where each clique 
of adjacent successors has size K\ 

• A'-trilateration orders on graphs with a minimal number of edges are inverse PEOs where each 
clique of adjacent successors has size A" + 1. 

Furthermore, it is easy to show that DDGP, AT-trilateration and Henneberg type I orders have a non- 
empty symmetric difference, and that there are PEO instances not corresponding to any inverse '^DMDGP 
or AT-trilateration orders. 

1.1.3 Matrices 

The incidence and adjacency structures of graphs can be well represented using matrices. For this reason, 
DG problems on graphs can also be seen as problems on matrices. 

1. A distance space is a pair {X, d) where X C M.^ and d : X x X ^ K-|_ is a distance function (i.e., a 
metric on X). 

2. A distance matrix for a finite distance space {X = {xi, . . . , a;„}, d) is the n x n square matrix 
E> = {duv) where for all u,v < \X\ we have d.^v = d{xu, x^). 
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3. A partial matrix on a field F is a pair (A, S) where A = (aij) is an m x n matrix on F and S 
is a set of pairs (i,j) with i < m and j < n; the completion of a partial matrix is a pair (a,B), 
where a : S' — )■ F and B = [bij) is an m x n matrix on F, such that V(i, j) € S {hij = a^j) and 
V(i, j) ^ S" {bij = a,j). 

4. An n X n matrix D = (dij) is a Euclidean distance matrix if there exists an integer K > and a 
set X = {si, . . . ,Xn} Q K-^ such that for all i,j<n we have = ||a;i — Xj\\. 

5. An n X n symmetric matrix A = (aij) is positive semidefinite if all its eigenvalues arc nonnegative. 

6. Given two n x n matrices A = (aij), B = (bij), the Hadamard product C = A o B is the n x n 
matrix C = (cij) where Cij = aijbij for all i, j < n. 

7. Given two n x n matrices A = (aij), B = (bij), the Frobenius (inner) product C = A • i? is defined 
as trace(ATB) = Y.i,j<n "ij^u - 

1.1.4 Realizations and rigidity 

The definitions below give enough information to define the concept of rigid graph, but there are several 
definitions concerning rigidity concepts. For a more extensive discussion, see Sect. 14.21 

1. Given a graph G = {V,E) and a manifold M C M^, a function x : G — > 7\if is an embedding 
of G in M if; (i) a; maps to a set of n points in M; (ii) x maps E to a, set of m simple arcs 
(i.e. homeomorphic images of [0, 1]) in M; (iii) for each S E, the endpoints of the simple arc 
Xuv are a;„ and Xv We remark that x can also be seen as a vector in R"^ or as an A' x n real 
matrix. 

2. An embedding such that M = and the simple arcs are line segments is called a realization of 
the graph in M^. A realization is valid if it satisfies Eq. ([1}. In practice wc ignore the action of x 
on E and only denote realizations as functions x :V ^ M^. 

3. Two realizations a;, y of a graph G = (V, £') are congruent if for every u, w G 1^ we have ||a;„ — = 
WUu — VvW- If x,y are not congruent then they are incongruent. If i? is a rotation, translation or 
reflection and Rx = [Rxi, . . . , Rxn), then Rx is congruent to x |28| . 

4. A framework in is a pair (G, x) where a; is a realization of G in R^. 

5. A displacement of a framework (G,x) is a continuous function y : [0,1] R"^^ such that: (i) 
2/(0) = x; (ii) y{t) is a valid realization of G for all t G [0, 1]. 

6. A flexing of a framework (G,.t) is a displacement y of x such that y{t) is incongruent to x for any 

te(0,i]. 

7. A framework is flexible if it has a flexing, otherwise it is rigid. 

8. Let (G, x) be a framework. Consider the linear system Ra = 0, where R is the m x nii' matrix each 
{u, u}-th row of which has exactly 2K nonzero entries Xui — Xm and Xyi — x^i (for {it, v} £ E and 
i < K), and a G R"^ is a vector of indcterminates. The framework is infinitesimally rigid if the 
only solutions of Ra = are translations or rotations |208| . and infinitesimally flexible otherwise. 
By [7H1 Thm. 4.1], infinitesimal rigidity implies rigidity. 

9. By |91l Thm. 2.1], if a graph has a unique infinitesimally rigid framework, then almost all its 
frameworks are rigid. Thus, it makes sense to define a rigid graph as a graph having an infinitesimally 
rigid framework. The notion of a graph being rigid independently of the framework assigned to it 
is also known as generic rigidity [15] . 
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A few remarks on the concept of embedding and congruence, which are of paramount importance 
throughout this survey, are in order. The definition of an embedding (Item [1]) is similar to that of a 
topological embedding. The latter, however, also satisfies other properties: no graph vertex is embedded 
in the interior of any simple arc (Vw G V, {w, w} G E (xy ^ Xuw)^ where 5*° is the interior of the set S), 
and no two simple arcs intersect (V{u, v} ^ {v, z} S i? H = 0)). The graph embedding problem 
on a given manifold, in the topological sense, is the problem of finding a topological embedding for a 
graph in the manifold: the constraints arc not given by the distances, but rather by the requirement 
that no two edges must be mapped to intersecting simple arcs. Garcy and Johnson list a variant of this 
problem as the open problem GRAPH Genus [751 OPENS]. The problem was subsequently shown to be 
NP-complete by Thomassen in 1989 ^1Q\- 




The definition of congruence concerns pairs of points: two distinct pairs of points {xi , X2 } and {yi , ?/2 } 
are congruent if the distance between xi and X2 is equal to the distance between yi and ?/2- This definition 
is extended to sets of points X,Y m a natural way: X and Y arc congruent if there is a surjective function 
f : X Y such that each pair {xi, X2} C X is congruent to {/(xi), f{x2)}- Set congruence implies that 
/ is actually a bijection; moreover, it is an equivalence relation [28l Ch. II §12]. 

1.2 A taxonomy of problems in distance geometry 

Given the broad scope of the presented material (and the considerable number of acronyms attached to 
problem variants), we believe that the reader will appreciate this introductory taxonomy, which defines 
the problems we shall discuss in the rest of this paper. Fig. [T] contains a graphical depiction of the 
logical/topical existing relations between problems. Although some of our terminology has changed from 
past papers, we are now attempting to standardize the problem names in a consistent manner. 

We sometimes emphasize problem variants where the dimension K is "fixed" . This is common in 
theoretical computer science: it simply means that K is a given constant which is not part of the 
problem input. The reason why this is important is that the worst-case complexity expression for the 
corresponding solution algorithms decreases. For example, in Sect. I3.3T^ we give an 0(n^+'^) algorithm 
for a problem parametrized on K. This is exponential time whenever K is part of the input, but it 
becomes polynomial when if is a fix;ed constant. 

1. Distance Geometry Problem (DGP) [28l Ch. IV §36-42], |121j : given an integer K > and 
a nonnegatively weighted simple undirected graph, find a realization in M.^ such that Euclidean 
distances between pairs of points are equal to the edge weigths (formal definition in Sect. [1]). We 
denote by DGPa' the subclass of DGP instances for a fixed K. 

2. Protein Structure from Raw Data (PSRD): we do not mean this as a formal decision problem, 
but rather as a practical problem, i.e. given all possible raw data concerning a protein, find the 
protein structure in space. Notice that the "raw data" might contain raw output from the NMR 
machinery, covalent bonds and angles, a subset of torsion angles, information about the secondary 
structure of the protein, information about the potential energy function and so on |189j (discussed 



3. Molecular Distance Geometry Problem (MDGP) [501 §1-3], |141[ : same as DGP3 (discussed 



4. DiSCRETiZABLE DISTANCE GEOMETRY PROBLEM (DDGP) [115j : subsct of DGP instances for 
which a vertex order is given such that: (a) a realization for the first K vertices is also given; (b) 
each vertex v of rank >K has >K adjacent predecessors (discussed in Sect. I3.33| . 

5. DISCRETIZABLE DISTANCE GEOMETRY PROBLEM in fixed dimension (DDGPk) jl59| : subset of 
DDGP for which the dimension of the embedding space is fixed to a constant value K (discussed 
in Sect. 13.3.5)) . The case K = 3 was specifically discussed in |159) . 




above) . 



in Sect.|S2]). 
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Discretizable MDGP (a.k.a. GDMDGP [145j) 
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DMDGP in fixed dimension I140l 
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DMDGPa with K = ?, [122J 
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interval DGP [50] 


iMDGP 


interval MDGP |155| 
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interval DMDGP jl23| 


Vertex orders 


DVOP 


Discretization Vertex Order Problem |115| 


A'-TRILAT 


.ftT-Trilateration order problem 1691 


Applications 


PSRD 


Protein Structure from Raw Data 


MDS 


Multi-Dimensional Scaling 


WSNL 
IKP 


Wireless Sensor Network Localization 12321 
Inverse Kinematic Problem |2llj 


Mathematics 



GRP 

MCP 
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Graph Rigidity Problem [233] 
Matrix Completion Problem [113] 
Euclidean Distance Matrix problem [28] 
Euclidean Distance MCP [ITT] 
Positive Semi-Definite determination |112j 
Positive Semi-Definite MCP fTTTl 



Figure 1: Relation map for problems related to distance geometry. 
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6. Discretization Vertex Order Problem (DVOP) [115j : given an integer K > and a simple 
undirected graph, find a vertex order such that the first K vertices induce a chque and each vertex 
of rank >K has >K adjacent predecessors (discussed in Sect. 13. 3T^ . 

7. if-TRiLATERATiON Order problem (A'-TRILAT) [63: like the DVOP, with "if" replaced by "X + l" 
(discussed in Sect. 13.3]) . 

8. DiSCRETIZABLE MOLECULAR DISTANCE GEOMETRY PROBLEM ('^DMDGP) jl45| : Subsct of DDGP 

instances for which the K immediate predecessors of v are adjacent to v (discussed in Sect. [X^ . 

9. DiSCRETIZABLE MOLECULAR DISTANCE GEOMETRY PROBLEM in fixed dimension (DMDGPk) 
[144] : subset of '^DMDGP for which the dimension of the embedding space is fixed to a constant 
value K (discussed in Sect. 13. 3|) . 

10. DiSCRETIZABLE MOLECULAR DISTANCE GEOMETRY PROBLEM (DMDGP) |122| : the DMDGPif 

with K = 3 (discussed in Sect. 13. 3p . 

11. INTERVAL Distance Geometry Problem (iDGP) [501 I121j : given an integer K > and a 
simple undirected graph whose edges are weighted with intervals, find a realization in such that 
Euclidean distances between pairs of points belong to the edge intervals (discussed in Sect. 13. 4| ). 

12. INTERVAL Molecular Distance Geometry Problem (iMDGP) [1551 1121| : the iDGP with 
K = 3 (discussed in Sect. 13. 4p . 

13. INTERVAL DiSCRETIZABLE MOLECULAR DISTANCE GEOMETRY PROBLEM (iDMDGP) |166| : given: 

(i) an integer K > 0; (ii) a simple undirected graph whose edges can be partitioned in three sets 
En, Es, Ej such that edges in Ejy are weighted with nonnegative scalars, edges in Es are weighted 
with finite sets of nonnegative scalars, and edges in Ej are weighted with intervals; (iii) a vertex 
order such that each vertex v of rank > K has at least K immediate predecessors which are adjacent 
to V using only edges in E^ U Es, find a realization in such that Euclidean distances between 
pairs of points are equal to the edge weights (for edges in En), or belong to the edge set (for edges 
in Es), or belong to the edge interval (for edges in Ej) (discussed in Sect. 13. 4p . 

14. Wireless Sensor Network Localization problem (WSNL) [25^ fWl ISS] : like the DGP, but 
with a subset A of vertices (called anchors) whose position in is known a priori (discussed in 
Sect. 14. 1|) . The practically interesting variants have K fixed to 2 or 3. 

15. Inverse Kinematic Problem (IKP) [211j : subset of WSNL instances such that the graph is a 
simple path whose endpoints arc anchors (discussed in Sect. 14.3.21) . 

16. Multi-Dimensional Scaling problem (MDS) [SS]: given a set X of vectors, find a set Y of smaller 
dimensional vectors (with \X\ = \Y\) such that the distance between the i-th and j-th vector of Y 
approximates the distance of the corresponding pair of vectors of X (discussed in Sect. I4.3.T|) . 

17. Graph Rigidity Problem (GRP) |233[ llllj : given a simple undirected graph, find an integer 
K' > such that the graph is (generically) rigid in M.^ for all K > K' (discussed in Sect. 14. 2|) . 

18. Matrix Gompletion Problem (MCP) [113j : given a square "partial matrix" (i.e. a matrix with 
some missing entries) and a matrix property P, determine whether there exists a completion of the 
partial matrix that satisfies P (discussed in Sect. [2]). 

19. Euclidean Distance Matrix problem (EDM) [28]: determine whether a given matrix is a Eu- 
clidean distance matrix (discussed in Sect. [5]). 

20. Euclidean Distance Matrix Gompletion Problem (EDMCP) pTTlfTT^igS] : subset of MCP 
instances with P corresponding to "Euclidean distance matrix for a set of points in for some 
A'" (discussed in Sect. [2]). 

21. Positive Semi-Definite determination (PSD) |112j : determine whether a given matrix is positive 
semi-definite (discussed in Sect. [5]) • 
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22. Positive Semi-Definite Matrix Completion Problem (PSDMCP) |llll 11121 1^ : subset of 
MCP instances with P corresponding to "positive semi-definite matrix" (discussed in Sect.[2|). 



1.3 DGP variants by inclusion 

The research carried out by the authors of this survey focuses mostly on the subset of problems in the 
Distance Geometry category mentioned in Fig. [TJ These problems, seen as sets of instances, are related 
by the inclusionwise lattice shown in Fig. [5] For reasons relating to our own development of these ideas, 
the names of some problems in this paper are different than those given in previously published papers; 
the definitions, however, coincide. 



iDGP 




DGP iMDGP 




DMDGPif DMDGP 



Figure 2: Inclusionwise lattice of DGP variants (arrows mean c). 



2 The mathematics of distance geometry 

This section will briefly discuss some fundamental mathematical notions related to DG. As is well known, 
DG has strong connections to matrix analysis, semidefinite programming, convex geometry and graph 
rigidity [53|. On the other hand, the fact that Godel discussed extensions to differentiable manifolds is 
perhaps less known (Sect. [52]), as well as perhaps the exterior algebra formalization (Sect. [575]) . 

Given a set Z^/ = {po, . . . ,pk} of iiT -|- 1 points in C M.^ , the volume of the i^-simplex defined by the 
points in U is given by the so-called Cayley-Menger formula [1511 11521 : 



A^(^) = \/^K^CM(ZY), (2) 
where CM{U) is the Cayley-Menger determinant P^ITS^I^ : 
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with duv = \\Pu ^ Pv\\ for all u^v £ {0,...,if}. The Cayley-Menger determinant is proportional to 
the quantity known as the oriented volume [50) (sometimes also called the signed volume), which plays 
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an important role in the theory of oriented matroids [27], Opposite signed values of simplex volumes 
correspond to the two possible orientations of a simplex keeping one of its facets fixed (see e.g. the two 
positions for vertex 4 in Fig. 01 center). In [231] . a generalization of DG is proposed to solve spatial 
constraints, using an extension of the Cayley-Menger determinant. 

2.1 The Euclidean Distance Matrix problem 

Cayley-Menger determinants were used in [28] to give necessary and sufficient conditions for the EDM 
problem, i.e. determining whether for a given n x n matrix D = (dij ) there exists an integer K and a set 
{pi, . . . ,Pn} of points of such that dij = \\pi — Pj\\ for all i,j < n. Necessary and sufficient conditions 
for a matrix to be a Euclidean distance matrix are given in jl99j . 

Theorem 2.1 (Thm. 4 in |199j ) A n x n distance matrix D is embeddable in M.^ but not in IR-'^"^ if 
and only if: (i) there is a principal {K + 1) x + 1) submatrix R of D with nonzero Cayley-Menger 
determinant; (ii) for ^ S {1,2}, every principal (K + fi) x {K + fi) submatrix of D containing R has zero 
Cayley-Menger determinant. 

In other words, the two conditions of this theorem state that there must be a JC-simplex S of reference 
with nonzero volume in M^, and all {K + 1)- and {K + 2)-simplices containing S' as a face must be 
contained in M.^ . 

2.2 Differentiable manifolds 

Condition (ii) in Thm. [^TTl fails to hold in the cases of (curved) manifolds. Godel showed that, for K = 3, 
the condition can be updated as follows (paper 1933h in |71j): for any quadruplet lAn of point sequences 
p" (for w e {0, . . . , 3}) converging to a single non-degenerate point po, the following holds: 

CM(^„) 
fini ,, = 0. 

In a related note, Godel also showed that if W = {po: • • ■ iPs} with QM{U) ^ 0, then the distance 
matrix over lA can be realized on the surface of a 2-sphere where the distances between the points are 
the lengths of the arcs on the spherical surface (paper 1933b in [71]). This observation establishes a 
relationship between DG and the Kissing Number Problem |108| and, more in general, to coding theory 

2.3 Exterior algebras 

Cayley-Menger determinants are exterior products [11]. The set of all possible exterior products of 
a vector space forms an exterior algebra, which is a special type of Clifford algebra [ID]; specifically, 
exterior algebras are tensor algebras modulo the ideal generated by x^. The fact that any square element 
of the algebra is zero implies = (x + y)^ = .t^ + xy -\- yx -\- y'^ = xy -\- yx, and hence xy = —yx. 
Accordingly, exterior algebras are used in the study of alternating multilinear forms. The paper |65j gives 
an in-depth view of the connection between DG and Clifford algebras. 

In the setting of distance geometry, we define the product of vectors xi, . . . ,x„ e (for n > K) 
by the corresponding Cayley-Menger determinant onU = {xg, . . . ,Xn} where xq is the origin. It is clear 
that, if Xi = Xj for some i ^ j, then the corresponding n-simplcx is degenerate and certainly has volume 
in R^ (even ii n = K), hence CM{h{) = 0. Equivalently, if a product fl- Xi can be written as x| J| Xi, 
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then it belongs to the ideal (x^) and is replaced by in the exterior algebra. This immediately implies 
that the Cayley-Menger determinant is an alternating form. 

Abstract relationships between an exterior algebra and its corresponding vector space are specialized 
to relationships between Cayley-Menger determinants and vectors in M^. Thus, for example, one can 
derive a well-known result in linear algebra: xi, . . . , xk are linearly independent if and only if CM{U) ^ 
where U = \xq, . . . ^xk\ with xq being the origin |111 140] . A more interesting example consists in 
deriving certain invariants expressed in Pliicker coordinates |40] : given a basis xi,. . . ,xk of M.^ and a 
basis Ui, . . . ,yh of M'' where h < K, it can be shown that for any subset S of {1, . . . , K} of size h there 
exist constants as such that 05 = Yl Hi- our setting, product vectors correspond to Cayley- 

iGS i<h 

Menger determinants derived from the given points xi, . . . ,xk and an origin xq. It turns out that the 
ratios of various ag's are invariant over different bases yj, . . . , j/^ of K'', which allows their employment 
as a convenient coordinate system for M.^. Invariants related to the Pliicker coordinates are exploited in 
[50] to find realizations of chirotopes (orientations of vector configurations [27]). 

2.4 Bideterminants 

For sets of more than K + 1 points, the determination of the relative orientation of each ii'-simplex 
in function of a iiT-simplex of reference (see e.g. Fig. |S1 center and right) is important. Such relative 
orientations are given by the Cayley-Menger bideterminant of two X-simplices U = {po, . . . ,pk} and 
V = {qo, . . ■,qK}, with d^j = \\pi - qj\\: 
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These bideterminants allow, for example, the determination of stercoisomctries in chemistry [27) . 
2.5 Positive semideflnite and Euclidean distance matrices 

Schoenberg proved in |190| that there is a one-to-one relationship between Euclidean distance matrices 
and positive semideflnite matrices. Let D = (dij) be an {n + I) x (n + 1) matrix and A ~ (flij) be the 
(n + 1) X (n + l) matrix given by a,j = ^{d^^ + d^j - d.^ ). 

The bijection given by Thm. 12.21 below can be exploited to show that solving the PSD and the EDM 
is essentially the same thing [198] . 

Theorem 2.2 (Thm. 1 in |198] ) A necessary and sufficient condition for the matrix D to be a Eu- 
clidean distance matrix with respect to a set lA — {poi ■•■,?'«} of points in but not in R^-i is that 
the quadratic form x^ Ax (where A is given above) is positive semideflnite of rank K. 

Schoenbcrg's theorem was cast in a very compact and elegant form in [51] : 

EDM = §^ n (S^ - §+), (5) 

where EBM is the set of n x n Euclidean distance matrices, § is the set of n x n symmetric matrices, 
Sft is the projection of § on the subspace of matrices having zero diagonal, §c is the kernel of the matrix 
map y — J> yi (with 1 the all-one n- vector), S^f is the orthogonal complement of §c, and S+ is the set of 
symmetric positive semideflnite n x n matrices. The matrix representation in ([5]) was exploited in the 
Alternating Projection Algorithm (APA) discussed in Sect. 13.4.41 
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2.6 Matrix completion problems 

Given an appropriate property P applicable to square matrices, the Matrix Completion Problem (MCP) 
schema ask whether, given an n x n partial matrix A', this can be completed to a matrix A such that 
P{A) holds. MCPs are naturally formulated in terms of graphs: given a weighted graph G = {V,E,a'), 
with a' : E' — > M, is there a complete graph K on V (possibly with loops) with an edge weight function 
a such that Guv = a'uv for ''^) ^ 

MCPs are an interesting class of inverse problems which find applications in the analysis of data, 
such as for example the reconstruction of 3D images from several 2D projections on random planes in 
cryo-electron microscopy |197| . When P{A) is the (informal) statement "A has low rank", there is an 
interesting application is to recommender systems: voters submit rankings for a few items, and consistent 
rankings for all items are required. Since few factors are believed to impact user's preferences, the data 
matrix is expected to have low rank |196) . 

Two celebrated specializations of this problem schema are the Euclidean Distance MCP (EDMCP) and 
the Positive Scmidefinite MCP (PSDMCP). These two problems have a strong link by virtue of Thm.[221 
and, in fact, there is a bijection between EDMCP and PSDMCP instances [111) . MCP variants where a'^j 
is an interval and the condition (i) is replaced by atj £ also exist (see e.g. |95] . where a modification 
of the EDMCP in this sense is given). 

2.6.1 Positive semidefinite completion 

Laurent }112| remarks that the PSDMCP is an instance of the Semidefinite Programming (SDP) feasibility 
problem: given integral nx n symmetric matrices Qo, - ■ ■ iQrm determine whether there exist scalars 
zi, . . . , Zm satisfying Qo + J2 -^iQi — ^- Thus, by Thm. 12.21 the EDMCP can be seen as an instance of 

the SDP feasibility problem too. The complexity status of this problem is currently unknown, and in 
particular it is not even known whether this problem is in NP. The same holds for the PSDMCP, and of 
hence also for the EDMCP. If one allows £-approximate solutions, however, the situation changes. The 
following SDP formulation correctly models the PSDMCP: 



max 




> 




A = {a,j) 


^ 

~ , ) 








\f{iJ}eE 


aij 


= <3- . 



Accordingly, SDP-bascd formulations and techniques are common in DC (see Sect. 14. iT^ . 

Polynomial cases of the PSDMCP are discussed in |1111 1112] (and citations therein). These include 
chordal graphs, graphs without minors, and graphs without certain induced subgraphs (e.g. wheels 
Wn with n > 5). Specifically, in |112| it is shown that if a graph G is such that adding m edges makes 
it chordal, then the PSDMCP is polynomial on G for fixed m. All these results naturally extend to the 
EDMCP. 

Another interesting question is, aside from actually solving the problem, to determine conditions on 
the given partial matrix to bound the cardinality of the solution set (specifically, the cases of one or 
finitely many solutions are addressed). This question is addressed in [95], where explicit bounds on the 
number of non-diagonal entries of A' are found in order to ensure uniqueness or finiteness of the solution 
set. 
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2.6.2 Euclidean distance completion 

The EDMCP differs from the DGP in that the dimension K of the embedding space is not provided 
as part of the input. An upper bound to the minimum possible K that is better than the trivial one 
{K < n) was given in [13] as: 

K< ^-\ (6, 

Because of Thm. [221 the EDMCP inherits many of the properties of the PSDMCP. We believe that 
Menger was the first to explicitly state a case of EDMCP in the literature: in |151l p. 121] (also see 
jl52l p. 738]) he refers to the matrices appearing in Cayley- Menger determinants with one missing entry. 
These, incidentally, are also used in the dual Branch-and-Prune (BP) algorithm (see Sect. I3.3.7.1[) . 

As mentioned in Sect. 12.6.11 the EDMCP can be solved in polynomial time on chordal graphs G = 
{V, E) [STlllll] . This is because a graph is chordal if and only if it has a perfect elimination order (PEO) 
[55] . i.e. a vertex order on V such that, for all v € V, the set of adjacent successors N{v) fl r]{v) is a 
chque in G. PEOs can be found in 0{\V\ + \E\) |180j . and can be used to construct a sequence of graphs 
G = {V,E) = Go,Gi, . . . ,Gs where Gs is a clique on V and E{Gi) = E{Gi-i) U {{u,w}}, where u is 
the maximum ranking vertex in the PEO of Gi_i such that there exists v £ r]{u) with {it, u} ^ E{Gi-i). 
Assigning to {u,v} the weight duv = V c^im + guarantees that the weighted (complete) adjacency 
matrix of Gg is a distance matrix completion of the weighted adjacency matrix of G, as required |87) . 
This resuh is introduced in [57] (for the PSDMCP rather than the EDMCP) and summarized in [TTT] . 



3 Molecular Conformation 

According to the authors' personal interest, this is the largest section in the present survey. DG is mainly 
(but not exclusively [25]) used in molecular conformation as a model of an inverse problem connected to 
the interpretation of NMR data. We survey continuous search methods, then focus on discrete search 
methods, then discuss the extension to interval distances, and finally present recent results specific to the 
NMR application. 



3.1 Test instances 

The methods described in this section have been empirically tested according to different instance sets 
and on different computational tcstbeds, so a comparison is difficult. In general, researchers in this area 
try to provide a "realistic" setting; the most common choices are the following. 

• Geometrical instances: instances are generated randomly from a geometrical model that is also 
found in nature, such as grids }154j . 

• Random instances: instances are generated randomly from a physical model that is close to 
reality, such as [TTillT^ . 

• Dense PDB instances: real protein conformations arc downloaded from the Protein Data Bank 
(PDB) [12], and then, for each residue, all within- residue distances as well as all distances between 
each residue and its two neighbours are generated [1551 [3] [4] ; 

• Sparse PDB instances: real protein conformations are downloaded from the Protein Data Bank 
(PDB) [IH], and then all distances within a given threshold arc generated [5?1 1122| . 

When the target application is the analysis of NMR data, as in the present case, the best test setting is 
provided by sparse PDB instances, as NMR can only measure distances up to a given threshold. Random 
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instances are only useful when the underlying physical model is meaningful (as is the case in |114j ). 
Geometrical instance could be useful in specific cases, e.g. the analysis of crystals. The problem with 
dense PDB instances is that, using the notions given in Sect. 13.31 and the fact that a residue contains 
more than 3 atoms, it is easy to show that the backbone order on these protein instances induces a 
3-trilateration order in M.^ (see Sect. [n~T|) . Since graphs with such orders can be realized in polynomial 
time [69j . they do not provide a particularly hard class of test instances. Moreover, since there are 
actually nine backbone atoms in each set of three consecutive residues, the backbone order is actually a 
7-trilateration order. In other words there is a surplus of distances, and the problem is overdetermined. 

Aside from a few early papers (e.g. |117l 11381 1139j ) we (the authors of this survey) always used test 
sets consisting mostly of sparse PDB instances. We also occasionally used geometric and (hard) random 
instances, but never employed "easy" dense PDB instances. 

3.1.1 Test result evaluation 

The test results always yield: a realization x for the given instance; accuracy measures for x, which 
quantify either how far is x from being valid, or how far is x from a known optimal solution; and a 
CPU time taken by the method to output x. Optionally, certain methods (such as the BP algorithm, see 
Sect. I3.3.6P might also yield a whole set of valid realizations. Different methods are usually compared 
according to their accuracy and speed. 

There are three popular accuracy measures. The penalty is the evaluation of the function defined in 
^ for a given realization x. The Largest Distance Error (LDE) is a scaled, averaged and square-rooted 
version of the penalty, given by X]{u v}£E ^"^^ — ^^^^ The Root Mean Square Deviation (RMSD) is 
a difference measure for sets of points in Euclidean space having the same center of mass. Specifically, if 
x, y are embeddings of G = [V, E), then RMSD(.t, y) = min^ \\y — Tx\\, where T varies over all rotations 
and translations in M^. Accordingly, if y is the known optimal configuration of a given protein, different 
realizations of the same protein yield different RMSD values. Evidently, RMSD is a meaningful accuracy 
measure only for test sets where the optimal conformations are already known (such as PDB instances). 

3.2 The Molecular Distance Geometry Problem 

The MDGP is the same as DGP3. The name "molecular" indicates that the problem originates from the 
study of molecular structures. 

The relationship between molecules and graphs is probably the deepest one existing between chemistry 
and discrete mathematics: a wonderful account thereof is given in [1^1 Ch. 4]. Molecules were initially 
identified by atomic formultE (such as H2O) which indicate the relative amounts of atoms in each molecule. 
When chemists started to realize that some compounds with the same atomic formula have different 
physical properties, they sought the answer in the way the same amounts of atoms were linked to each 
other through chemical bonds. Displaying this type of information required more than an atomic formula, 
and, accordingly, several ways to represent molecules using diagrams were independently invented. The 
one which is still essentially in use today, consisting in a set of atom symbols linked by segments, is 
originally described in [34) . The very origin of the word "graph" is due to the representation of molecules 

The function of molecules rests on their chemical composition and three-dimensional shape in space 
(also called structure or conformation). As mentioned in Sect. [TJ NMR experiments can be used to 
determine a subset of short Euclidean distances between atoms in a molecule. These, in turn, can be 
used to determine its structure, i.e. the relative positions of atoms in M'^. The MDGP provides the 
simplest model for this inverse problem: V models the set of atoms, E the set of atom pairs for which a 
distance is avaiable, and the function d : E ^ M+ assigns distance values to each pair, so that G = {V, E) 
is the graph of the molecule. Assuming the input data is correct, the set X of solutions of the MDGP on 
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G will yield all the structures of the molecule which are compatible with the observed distances. 

In this section we review the existing methods for solving the MDGP with exact distances on general 
molecule graphs. 

3.2.1 General-purpose approaches 

Finding a solution of the set of nonlinear equations ([IJ poses several numerical difhculties. Recent 
(unpublished) tests performed by the authors of this survey determined that tiny, randomly generated 
weighted graph instances with fewer than 10 vertices could not be solved using Octave's nonlinear equation 
solver f solve [66]. Spatial Branch-and-Bound (sBB) codes such as Couenne [Ij could solve instances 
with \V\ G {2,3,4} but no larger in reasonable CPU times: attaining feasibility of local iterates with 
respect to the nonlinear manifold defined by ([T]) is a serious computational challenge. This motivates the 
following formulation using Mathematical Programming (MP): 



The Global Optimization (GO) problem Q aims to minimize the squared infeasibility of points in M.^ 
with respect to the manifold ((T|). Both terms in the squared difference are themselves squared in order 
to decrease floating point errors (NaN occurrences) while evaluating the objective function of ([7]) when 
\\xu — Xy\\ is very close to 0. We remark that ([7|) is an unconstrained nonconvex Nonlinear Program 
(NLP) whose objective function is a nonnegative polynomial of fourth degree, with the property that 
X S X if and only if the evaluation of the objective function at x yields 0. 

In |117j , we tested formulation (O and some variants thereof with three GO solvers: a Multi-Level 
Single Linkage (MLSL) multi-start method [109] . a Variable Neighbourhood Search (VNS) meta-heuristic 
for nonconvex NLPs [134] , and an early implementation of sBB [1461 11321 1135j (the only solver in the set 
that guarantees global optimality of the solution to within a given £ > tolerance). We found that it 
was possible to solve artificially generated, but realistic protein instances |114j with up to 30 atoms using 
the sBB solver, whereas the two stochastic heuristics could scale up to 50 atoms, with VNS yielding the 
best performance. 

3.2.2 Smoothing based methods 

A smoothing of a multivariate multimodal function f{x) is a family of functions Fx{x) such that Fq(x) = 
f{x) for all X G M^^ and F\{x) has a decreasing number of local optima as A increases. Eventually F\ 
becomes convex, or at least invex [15j . and its optimum x^ can be found using a single run of a local 
NLP solver. A homotopy continuation algorithm then traces the sequence a;^ in reverse as A — ?> 0, by 
locally optimizing Fx-a\{x) for a given step AA with x'^ as a starting point, hoping to identify the global 
optimum x* of the original function /(x) |102| . A smoothing operator based on the many-dimensional 
diffusion equation AF = where A is the Laplacian X]i<n^^/^^?' derived in |102] as the Fouricr- 
Poisson formula 



also called Gaussian transform in jl54j . The Gaussian transform with the homotopy method provides a 
successful methodology for optimizing the objective function: 




{u,v}eE 



(7) 




(8) 




(9) 



{u,v}eE 



where a; G M^. More information on continuation and smoothing-based methods applied to the iMDGP 
can be found in Sect. 13.41 
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In [154) . it is shown that the closed form of the Gaussian transform apphed to (O is: 

{f)x^f{x) + 10\' J2 {\\xu-x4'-6dt\^) + 15X'\E\. (10) 

{u,v}eE 

Based on this, a continuation method is proposed and successfuhy tested on a set of cubical grids. The 
implementation of this method, DGSOL, is one of the few MDGP solution codes that are freely available 
(source included): sec http : //www.mcs . anl .gov/~more/dgsol/, DGSOL has several advantages: it is 
efficient, effective for small to medium-sized instances, and, more importantly, can naturally be extended 
to solve zMDGP instances (which replace the real edge weights with intervals). The one disadvantage 
we found with DGSOL is that it does not scale well to large-sized instances: although the method is 
reasonably fast even on large instances, the solution quality decreases. On large instances, DGSOL 
often finds infeasibilities that denote not just an offset from an optimal solution, but a completely wrong 
conformation (see Fig. [3]) . 




Figure 3: Comparison of a wrong molecular conformation for Imbn found by DGSOL (left) with the 
correct one found by the BP Alg. [1] (right). 

In [3l|4] an exact reformulation of a Gaussian transform of ([7]) as a difference of convex (d.c.) functions 
is proposed, and then solved using a method similar to DGSOL, but where the local NLP solution is carried 
out by a different algorithm, called DCA. Although the method does not guarantee global optimality, 
there are empirical indications that the DCA works well in that sense. This method has been tested on 
three sets of data: the artificial data from More and Wu |154j (with up to 4096 atoms), 16 proteins in the 
PDB [T^ (from 146 up to 4189 atoms), and the data from Hendrickson [92] (from 63 up to 777 atoms). 

In |139| . VNS and DGSOL were combined into a heuristic method called Double VNS with Smoothing 
(DVS). DVS consists in running VNS twice: first on a smoothed version {f)x of the objective function 
f{x) of (O, and then on the original function /(x) with tightened ranges. The rationale behind DVS 
is that (/)a is easier to solve, and the homotopy defined by A should increase the probability that the 
global optimum of (/)a is close to the global optimum x* of f{x). The range tightening that allows 
VNS to be more efficient in locating x* is based on a "Gaussian transform calculus" that gives explicit 
formula that relate {f)x to f{x) whenever A and d change. These formula are then used to identify 
smaller ranges for x* . DVS is more accurate but slower than DGSOL. 

It is worth remarking that both DGSOL and the DCA methods were tested using (easy) dense PDB 
instances, whereas the DVS was tested using geometric and random instances (see Sect. 13. 1|) . 
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3.2.3 Geometric build-up methods 

In [53], a combinatorial method called geometric build-up (GB) algorithm is proposed to solve the MDGP 
on sufficiently dense graphs. A subgraph H of G, initially chosen to only consist of four vertices, is given 
together with a valid realization a;. The algorithm proceeds iteratively by finding Xy for each vertex 
V G V{G) \ V{H). When is determined, v and Sniv) are removed from G and added to H. For this 
to work, at every iteration two conditions must hold; 

1. \SHiv)\ >4; 

2. at least one subgraph H' of H, with V{H') = {wi, M2, U3, 1x4} and = 4, must be such that 
the realization x restricted to H' is non-coplanar. 

These conditions ensure that the position Xy can be determined using triangulation. More specifically, 
let x\h' = {xui I * < 4} C R'^. Then x^ is a solution of the following system: 
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Squaring both sides of these equations, wc have: 
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By subtracting one of the above equations from the others, one obtains a linear system that can be used 
to determine Xy. For example, subtracting the first equation from the others, we obtain 

Ax = b, (11) 

where 
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and 

UL,-diJ~kxuA\i-\\xujn . 

Since x„j , , x„3 , a:^^ are non-coplanar, ([TT|) has a unique solution. 

The GB is very sensitive to numerical errors [53]. In |226j . Wu and Wu propose an updated GB 
algorithm where the accumulated errors can be controlled. Their algorithm was tested on a set of sparse 
PDB instances consisting of 10 proteins with 404 up to 4201 atoms. The results yielded RMSD measures 
ranging from O(10~^) to O(10~^'^). It is interesting to remark that if G is a complete graph and duv G Q+ 
for all {u,v} G E, this approach solves the MDGP in linear time 0{n) [53]. A more complete treatment 
of MDGP instances satisfying the if-dimensional generalization of conditions [T][5] above is given in [5^15] 
in the framework of the WSNL and iC-TRILAT problems. 

An extension of the GB that is able to deal with sparser graphs (more precisely, Sh{v) > 3) is given 
in [37]; another extension along the same lines is given in |227j . We remark that the set of graphs such 
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that Sh{v) > 3 and the condition [21 above hold are precisely the instances of the DDGP such that K = 3 
(see Sect. I3.3T5|) : this problem is discussed extensively in |159j . The main conceptual difference between 
these GB extensions and the Branch-and-Prune (BP) algorithm for the DDGP [159] (see Sect. l3.3l below') 
is that BP exploits a given order on V (see Sect. I1.1.2p . Since the GB extensions do not make use of 
this order, they arc heuristic algorithms: if 5h{v) < 3 at iteration u, then the GB stops, but there is 
no guarantee that a different choice of "next vertex" might not have carried the GB to termination. A 
very recent review on methods based on the GB approach and on the formulation of other DGPs with 
inexact distances is given in [218j . The BP algorithm ( Alg. [1]) marks a striking difference insofar as the 
knowledge of the order guarantees the exactness of the algorithm. 

3.2.4 Graph decomposition methods 

Graph decomposition methods are mixed-combinatorial algorithms based on graph decomposition: the 
input graph G = (V, E) is partitioned or covered by subgraphs H, each of which is realized independently 
(the local phase) . Finally, the realizations of the subgraphs are "stitched together" using mathematical 
programming techniques (the global phase) . The global phase is equivalent to applying MDGP techniques 
to the minor G' of G obtained by contracting each subgraph H io a, single vertex. The nice feature of 
these methods is that the local phase is amenable to efficient yet exact solutions. For example, if H is 
uniquely realizable, then it is likely to be realizable in polynomial time. More precisely, a graph H is 
uniquely realizable if it has exactly one valid realization in modulo rotations and translations, see 
Sect. 14.1.11 A graph H is uniquely localizable if it is uniquely realizable and there is no K' > K such 
that H also has a valid realization affinely spanning M.^ . It was shown in |201j that uniquely localizable 
graphs are realizable in polynomial time (see Sect. I4.1.2p . On the other hand, no graph decomposition 
algorithm currently makes a claim to overall exactness: in order to make them practically useful, several 
heuristic steps must also be employed. 

In ABBIE [92], both local and global phases are solved using local NLP solution techniques. Once 
a realization for all subgraphs H is known, the coordinates of the vertex set Vh of H can be expressed 
relatively to the coordinates of a single vertex in Vh ; this corresponds to a starting point for the realization 
of the minor G' . ABBIE was the first graph decomposition algorithm for the DGP, and was able to realize 
sparse PDB instances with up to 124 amino acids, a considerable feat in 1995. 

In DISCO |131[ . V is covered by appropriately-sized subgraphs sharing at least K vertices. The local 
phase is solved using an SDP formulation similar to the one given in |25] . The local phase is solved using 
the positions of common vertices: these are aligned, and the corresponding subgraph is then rotated, 
reflected and translated accordingly. 

In |24j . G is covered by appropriate subgraphs H which are determined using a swap-based heuristic 
from an initial covering. Both local and global phases are solved using the SDP formulation in [22 . A 
version of this algorithm targeting the WSNL (see Sect. 14. 1[) was proposed in [23]: the difference is that, 
since the positions of some vertices is known a priori, the subgraphs H are clusters formed around these 
vertices (see Sect. 14.1."^ . 

In [105) . the subgraphs include one or more (ii'-l- l)-cliques. The local phase is very efficient, as cliques 
can be realized in linear time [1991 [S5] . The global phase is solved using an SDP formulation proposed 
in 12] (also see Sect. HX^ . 

A very recent method called 3D- ASAP [52], designed to be scalable, distributable and robust with 
respect to data noise, employs either a weak form of unique localizability (for exact distances) or spectral 
graph partitioning (for noisy distance data) to identify clusters. The local phase is solved using cither 
local NLP or SDP based techniques (whose solutions are refined using appropriate heuristics) , whilst the 
global phase reduces to a 3D synchronization problem, i.e. finding rotations in the special orthogonal 
group 50(3, M), reflections in Z2 and translations in R"^ such that two similar distance spaces have the 
best possible alignment in R'^. This is addressed using a 3D extension of a spectral technique introduced 
in |195[ . A somewhat simpler version of the same algorithm tailored for the case K = 2 (with the WSNL 
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as motivating application, see Sect. 14. 1|) is discussed in j51) . 



3.3 Discretizability 

Some DGP instances can be solved using mixed-combinatorial algorithms such as GB-based fSect. [3?273)) 
and graph decomposition based (Sect. I3.2.4p methods. Combinatorial methods offer several advantages 
with respect to continuous ones, for example accuracy and efficiency. In this section, we shall give an 
in-depth view of discretizability of the DGP, and discuss at length an exact combinatorial algorithm for 
finding all solutions to those DGP instances which can be discretized. 

We let X be the set of all valid realizations in of a given weighted graph G ~ (V, E, d) modulo 
rotations and translations (i.e. if x E X then no other valid realization y for which there exists a rotation 
or translation operator T with y Tx is in X). We remark that we allow refiections for technical 
reasons: much of the theory of discretizability is based on partial reflections, and since any reficction is 
also a partial (improper) reflection, disallowing reflections would complicate notation later on. In practice, 
the DGP system ([I]) can be reduced modulo translations by fixing a vertex vi to Xy-^ = (0, . . . , 0) and 
modulo rotations by fixing an appropriate set of components out of the realizations of the other K — 1 
vertices {v2, • . • , vk} to values which arc consistent with the distances in the subgraph of G induced by 
{v,\l<i< K}. 

Assuming X ^ 0, every x € X is a. solution of the polynomial system: 

y{u,v}eE ||.T„ =^2^, (12) 

and as such it has either finite or uncountable cardinality (this follows from a fundamental result on the 
structure of semi-algebraic sets [111 Thm. 2.2.1], also see |153] ). This feature is strongly related to graph 
rigidity (see Sect. II. l.H 14.2. 2p : specifically, \X\ is finite for a rigid graph, and almost all non-rigid graphs 
yield uncountable cardinalities for X whenever X is non-empty. If we know that G is rigid, then |X| is 
finite, and a posteriori, we only need to look for a finite number of realizations in R^: a combinatorial 
search is better suited than a continuous one. 

When A' = 2, it is instructive to inspect a graphical representation of the situation (Fig. H]). The 

3 






Figure 4: A flexible framework (left), a rigid graph (center), and a uniquely localizable (rigid) graph 
(right). 

framework for the graph ({1,2, 3, 4}, {{1, 2}, {1, 3}, {2, 3}, {2, 4}}) shown in Fig. g] (left) is flexible: any 
of the uncountably many positions for vertex 4 (shown by the dashed arrow) yield a valid realization of 
the graph. If we add the edge {1,4} there are exactly two positions for vertex 4 (Fig. |4l center), and if 
we also add {3,4} there is only one possible position (Fig. 01 right). Accordingly, if we can only use one 
distance c?24 to realize X4 in Fig. [H (left) X is uncountable, but if we can use if = 2 distances (Fig. 
center) or K + 1 = 3 distances (Fig. right) then \X\ becomes finite. The GB algorithm [53| and the 
triangulation method in [BS] exploit the situation shown in Fig. 01 (right); the difference between these 
two methods is that the latter exploits a vertex order given a priori which ensures that a solution could 
be found for every realizable graph. 
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The core of the work that the authors of this survey have been carrying out (with the help of several 
colleagues) since 2005 is focused on the situation shown in Fig. |4] (center): we do not have one position 
to realize the next vertex v in the given order, but (in almost all cases) two: x^,xl,, so that the graph 
is rigid but not uniquely so. In order to disregard translations and rotations, we assume a realization x 
of the first K vertices is given as part of the input. This means that there will be two possible positions 
for xk+1, four for xk+2, and so on. All in all, \X\ ~ 2^^^ . The situation becomes more interesting if 
we consider additional edges in the graph, which sometimes make one or both of x^ , x\ infeasible with 
respect to Eq. ([T]). A natural methodology to exploit this situation is to follow the binary branching 
process whenever possible, pruning a branch x^^ {£ € {0, 1}) only when there is an additional edge {u,v} 
whose associated distance d^v is incompatible with the position xf^. We call this methodology Branch- 
and-Prune (BP). 

Our motivation for studying non-uniquely rigid graphs arises from protein conformation: realizing 
the protein backbone in R"^ is possibly the most difficult step to realizing the whole protein (arranging 
the side chains can be seen as a subproblem [1841 1183] ). As discussed in the rest of this section, protein 
backbones conveniently also supply a natural atomic ordering, which can be exploited in various ways 
to produce a vertex order that will guarantee exactness of the BP. The edges necessary to pruning are 
supplied by NMR experiments. A definite advantage of the BP is that it offers a theoretical guarantee 
of finding all realizations in X, instead of just one as most other methods do. 

3.3.1 Rigid geometry hypothesis and molecular graphs 

Discrctizability of the search space turns out to be possible only if the molecule is rigid in physical space, 
which fails to be the case in practice. In order to realistically model the flexing of a molecule in space, 
it is necessary to consider the bond-stretching and bond-bending effects, which increase the number of 
variables of the problem and also the computational effort to solve it. However, it is common in molecular 
conformational calculations to assume that all bond lengths and bond angles are fixed at their equilibrium 
values, which is known as the rigid-geometry hypothesis |77j . 

It follows that for each pair of atomic bonds, say {u, v}, {v, w}, the covalent bond lengths duv, dvw are 
known, as well as the angle between them. With this information, it is possible to compute the remaining 
distance duw ■ Every weighted graph G representing bonds (and their lengths) in a molecule can therefore 
be trivially completed with weighted edges {w, w} whenever there is a path with two edges connecting u 
and w. Such a completion, denoted G^, is called a molecular graph I99j . Wc remark that all graphs that 
the BP can realize are molecular, but not vice versa. 

3.3.2 Development of the Branch-and-Prune algorithm 

To the best of our knowledge, the first discrete search method for the MDGP that exploits the intersection 
of three spheres in R'^ was proposed by three of the co-authors of this survey (CL, LL, NM) in 2005 jll6| . in 
the framework of a quantum computing algorithm. Quite independently, the GB algorithm was extended 
in 2008 j227| to deal with intersections of three rather than four spheres. Interestingly, as remarked in 
Sect. 13.2.31 another extension to the same case was proposed by a different research group in the same 
year |37| . By contrast, the idea of a vertex order used to find realizations iteratively was already present 
in early works in statics |1851 155] (see Sect. 14.2) ) and was first properly formalized in [M] (see Sect. 14.2^ . 

The crucial idea of combining the intersection of three spheres with a vertex ordering which would 
offer a theoretical guarantee of exactness occurred in June 2005, when two of the co-authors of this 
survey (CL, LL) met during an academic visit to Milan. The first version of the BP algorithm was 
conceived, implemented and computationally validated during the summer of 2005: this work, however, 
only appeared in 2008 |138j due to various editorial mishaps. Between 2005 and 2008 we kept on working 
at the theory of the DMDGP; wc were able to publish an arXiv technical report in 2006 |118| . which was 
eventually completed in 2009 and published online in 2011 |122| . Remarkably, our own early work on 
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BP and an early version of |227) were both presented at the International Symposium on Mathematical 
Programming (ISMP) in Rio de Janeiro already in 2006. 

Along the years we improved and adapted the original BP |138j to further settings. We precisely 
defined the DGP subclasses on which it works, and proved it finds all realizations in X for these subclasses 
[1181 11241 11221 1159j . We discussed how to determine a good vertex order automatically [115j . We tested 
and fine-tuned the BP to proteins [165| . We compared it with other methods |168| . We tried to decompose 
the protein backbone in order to reduce the size of the BP trees |171| . We adapted it to work with intervals 
instead of exact distances [1561 11281 11611 1123j . We engineered it to work on distances between atoms of 
given type (this is an important restriction of NMR experiments) [125111261 11601 [T?71ll29j . We generalized 
it to arbitrary values of K and developed a theory of symmetries in protein backbones [1421 11451 1143j . 
We exploited these symmetries in order to immediately reconstruct all solutions from just one [157[ I158| . 
We showed that the BP is fixed-parameter tractable on protein-like instances and empirically appears 
to be polynomial on proteins |144[ I140| . We derived a dual BP which works in distance rather than 
realization space [136] . We put all this together so that it would work on real NMR data [1661 1148j . 
We started working on embedding the side chains [183| . We took some first steps towards applying BP 
to more general molecular conformation problems involving energy minimization [130| . We provided an 
open-source [167] implementation and tested some parallel ones [1641 WI\ . We wrote a number of other 
surveys [119[ 11411 1121[ 1162] . but none as extensive as the present one. We also edited a book on the 
subject of distance geometry and applications [163j . 

3.3.3 Sphere intersections and probability 

For a center c S and a radius r £ M+, we denote by S^~^{c, r) the sphere centered at c with radius 
r in R^. The intersection of K spheres in might contain zero, one, two or uncountably many points 
depending on the position of the centers si, . . . ,xk and the lengths di^K+i, ■ ■ ■ ,dK,K+i of the radii. 
Call P = f]-^p. S^^^{xi,di^K+i) be the intersection of these K spheres and = {.t,; | i < K}. If 
dim aff(W^) < K — 1 then |P| is uncountable [1151 Lemma 3] (see Fig. [5]). Otherwise, if dimaff(i^^) = 
K — 1, then |P| £ {0, 1, 2} [1151 Lemmata 1-2]. We also remark that the condition dim a.S(iU~) < K — 1 



Figure 5: When three sphere centers are collinear in 3D, a non-empty sphere intersection (the thick circle) 
has uncountable cardinality. 

corresponds to requiring that CM{14~) = 0. See [172] for a detailed treatment of sphere intersections in 
molecular modelling. 

Now assume dim aff(Z//~) = K — 1, let xk+i be a given point in P and let U = U {xk+i}- The 
inequalities Ak(U) > (see Eq. ^) are called simplex inequalities (or strict simplex inequalities if 
Ak{U) > 0). We remark that, by definition of the Cayley-Menger determinant, the simplex inequalities 
are expressed in terms of the squared values d^v of the distance function, rather than the points in lA. 
Accordingly, given a weighted clique K = {U,E,d) where \U\ = K + 1, we can also denote the simplex 
inequalities as AK{U,d) > 0. If the simplex inequalities fail to hold, then the clique cannot be realized 
in R^, and P = 0. If AK{U,d) ~ the simplex has zero volume, which implies that |P| = 1 by [1151 




DISTANCE GEOMETRY PROBLEMS 24 



Lemma 1]. If the strict simplex inequalities hold, then \P\ = 2 by |115l Lemma 2] (see Fig. [S]). In 



Figure 6: General case for the intersection P of three spheres in M.^. 

summary, if CM(iY^) = then P is uncountable, if A/^([/, d) = then \P\ = 1, and all other cases lead 
to |P|G{0,2}. 

Considering the usual probability space on M^^ defined by the Lebcsguc measure, the probability of 
any sampled point belonging to any given set having Lebesgue measure zero is equal to zero. Since both 
{a; e I CM(U-)} and {x 6 | Ak ([/, d) = 0} are (strictly) lower dimensional manifolds in R , 
they have Lebesgue measure zero. Thus the probability of having |P| = 1 or P uncountable for any given 
X e M.^' is zero. Furthermore, if we assume P ^ 0, then \P\ = 2 with probability 1. We extend this notion 
to hold for any given sentence p(x): the statement "Vx G Y {p{x) with probability 1)" means that the 
statement p(a:) holds over a subset oiY that has Lebesgue measure 1. Typically, this occurs whenever p is 
a geometrical statement about Euclidean space that fails to hold for strictly lower dimensional manifolds. 
These situations, such as collinearity causing an uncountable P in Fig. [5j are generally described by 
equations. Notice that an event can occur with probability 1 conditionally to another event happening 
with probability 0. For example, we shall show in Sect. 13.3.91 that the cardinality of the solution set of 
YES instances of the '^DMDGP is a power of two with probability 1, even though a '^DMDGP instance 
has probability of being a YES instance, when sampled uniformly in the set of all '^DMDGP instances. 

We remark that our notion of "statement holding with probability 1" is different from the genericity 
assumption which is used in early works in graph rigidity (see Sect. 14.21 and [IS]): a finite set S of real 
values is generic if the elements of S are algebraically independent over Q, i.e. there exists no rational 
polynomial whose set of roots is S. This requirement is sufficient but too stringent for our aims; and 
besides, since most computer implementations will only employ (a subset of) rational numbers, it makes 
the theory completely inapplicable, as is also remarked in |92| . The notion we propose might be seen as 
an extension to Graver's own definition of genericity, which he appropriately modified to suit the purpose 
of combinatorial rigidity: all minors of the complete rigidity matrix must be nontrivial (see Sect. 14.2.21 
and [84]). 



3.3.4 The Discretizable Vertex Ordering Problem 

The theory of sphere intersections, as described in Sect. l3.3~3l implies that if there exists a vertex order on 
V such that each vertex v such that p{v) > K has exactly K adjacent predecessors, then with probability 
1 we have \X\ = 2"~^ . If there are at least K adjacent predecessors, \X\ < 2"~^ as either or both 
positions x^^x^ for v might be infeasible with respect to some distances. In the rest of the paper, to 
simplify notation we identify each vertex v E V with its (unique) rank p{v), let V = and 
write, e.g. u — v to mean p{u) — p(v) or v > K to mean p{v) > K. 



In this section we discuss the problem of identifying an order with the properties above. Formally, 
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the DVOP asks to find a vertex order on V such that G[{1, . . . , K}] is a ii'-clique and such tliat Vv > 
K (\N{v)r\^{v)\ > K). We ask that the first K vertices sliould induce a clique in G because this will allow 
us to realize the first K vertices uniquely — it is a requirement of discretizable DGPs that a realization 
should be known for the first K vertices. 

The DVOP is NP-complete by trivial reduction from K-c\iq\xc. An exponential time solution algo- 
rithm consists in testing each subset of K vertices: if one is a clique, then try to build an order by 
greedily choosing a next vertex with the largest number of adjacent predecessors, stopping whenever this 
is smaller than K. This yields an 0(n^+^) algorithm. If K is a fixed constant, then of course this 
becomes a polynomial algorithm, showing that the DVOP with fixed K is in P. Since DGP applications 
rarely require a variable A', this is a positive result. 

The computational results given in [115] show that solving the DVOP as a pre-processing step some- 
times allows the solution of a sparse PDB instance whose backbone order is not a DVOP order. This 
may happen if the distance threshold used to generate sparse PDB instances is set to values that are 
lower than usual (e.g. 5.5A instead of 6A). 

3.3.5 The Discretizable Distance Geometry Problem 

The input of the DDGP consists of: 

• a simple weighted undirected graph G = (F, E, d); 

• an integer K > 0; 

• an order on V such that: 

— for each v > K, the set N{v) f) 'y{v) of adjacent predecessors has at least K elements; 

— for each v > K, N{v) D 7(w) contains a subset [/„ of exactly K elements such that: 

* G[Uv] is a iiT-clique in G; 

* strict triangular inequalities AK-i{Uv,d) > hold (sec Eq. (0)); 

• a valid realization x of the first K vertices. 

The DDGP asks to decide whether x can be extended to a valid realization of G jll5| . The DDGP with 
fixed K is denoted by DDGP^; the DDGP3 is discussed in (TMl- 

We remark that any method that computes x^ in function of its adjacent predecessors is able to employ 
a current realization of the vertices in during the computation of Xy. As a consequence, AK-i(Uv,d) 
is well defined (during the execution of the algorithm) even though G[Uy] might fail to be a clique in 
G. Thus, more DGP instances beside those in the DDGP can be solved with a DDGP method of this 
kind. To date, we failed to find a way to describe such instances aprioristically. The DDGP is NP-hard 
because it contains the DMDGP (see Sect. 13. below), and there is a reduction from Subset-Sum [73] 
to the DMDGP ^TT2\ . 

3.3.6 The Branch-and-Prune algorithm 

The recursive step of an algorithm for realizing a vertex v given an embedding x' for G[Uv], where Uy is 
as given in Sect. 13. 331 is shown in Alg.[T] We recall that S^~^{y, r) denotes the sphere in centered at 
y with radius r. By the discretization due to sphere intersections, we note that \P\ < 2. The Branch-and- 
Prune (BP) algorithm consists in calling BP(/ir + 1, x, 0). The BP finds the set X of all valid realizations 
of a DDGP instance graph G = {V,E,d) in modulo rotations and translations [HHl HH [ISi . The 
structure of its recursive calls is a binary tree (called the BP tree), which contains 2"~^ nodes in the 
worst case; this makes BP a worst-case exponential algorithm. Fig. [7] gives an example of a BP tree. 

Realizations x G X can also be represented by sequences xi^) £ { — 1,1}" such that: (i) x{^)v = 1 
for all V < K; (ii) for all v > K, x{^)v = ^1 if o^^v < 10 and x{^)v = 1 if axy > oq, where ax — oq is 
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Algorithm 1 BP(t;, x, X) 

Require: A vertex w e 1/ \ [K], an embedding x' for G[Uy], a set X. 

2: for G P do 
4: if w = n then 

5: X^X\j{x} 

6: else 

7: BP(i> + l,a;,X) 

8: end if 

9: end for 




Figure 7: An example of BP tree on the random instance lavorll_7 |114j . Pruning edges (see 
Sect. 13.3.6.11) are as follows: N{2) = {9}, iV(3) = N{^) = {8,9,10}, iV(5) = {9,10}, iV(6) = {10}, 
iV(7) = {ll}. 

the equation of the hyperplane through x{Uy) = {xu | it G ?7„}, which is unique with probability 1. The 
vector x{^) is also known as the chirality jSOj of x (formally, the chirality is defined to be xix)v = if 
ax = ao, but since this case holds with probability 0, we disregard it). 

The BP (Alg. [T]) can be run to termination to find all possible valid realizations of G, or stopped after 
the first leaf node at level n is reached, in order to find just one valid realization of G. Compared to 
most continuous search algorithms we tested for DGP variants, the performance of the BP algorithm is 
impressive from the point of view of both efficiency and reliability, and, to the best of our knowledge, it is 
currently the only method that is able to find all valid realizations of DDGP graphs. The computational 
results in |122| . obtained using sparse PDB instances as well as hard random instances |114| . show that 
graphs with thousands of vertices and edges can be realized on standard PC hardware from 2007 in fewer 
than 5 seconds, to an LDE accuracy of at worst 0(10^*). Complete sets X of incongruent realizations 
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were obtained for 25 sparse PDB instances (generation threshold fixed at gA) having sizes ranging from 
n = 57, m = 476 to n = 3861, m = 35028. All such sets contain exactly one realization with RMSD 
value of at worst O(10~^), together with one or more isomers, all of which have LDE values of at worst 
O(10~^) (and most often O(10~^^) or less). The cumulative CPU time taken to obtain all these solution 
sets is 5.87s of user CPU time, with one outlier taking 90% of the total. 

3.3.6.1 Pruning devices We partition E into the sets Ed = {{u,v} G E \ u € Uy} and Ep = 
E \ E]j. We call Ed the discretization edges and Ep the pruning edges. Discretization edges guarantee 
that a DGP instance is in the DDGP. Pruning edges are used to reduce the BP search space by pruning 
its tree. In practice, pruning edges might make the set T in Alg. [IJhave cardinality or 1 instead of 2, 
if the distance associated with them is incompatible with the distances of the discretization edges. 

The pruning carried out using pruning edges is called Direct Distance Feasibility (DDF) , and is by far 
the easiest, most efficient, and most generally useful. Other pruning tests have been defined. A different 
pruning technique called Dijkstra Shortest Path (DSP) was considered in |122l Sect. 4.2], based on the 
fact that G is a Euclidean network. Specifically, the total weight of a shortest path from u to v provides 
an upper bound to the Euclidean distance between Xu and Xy, and can therefore be employed to prune 
positions Xy which are too far from x^ ■ The DSP was found to be effective in some instances but too often 
very costly. Other, more effective pruning tests, based on chemical observations, have been considered in 
[166] . 

3.3.7 Dual Branch-and-Prune 

There is a close relationship between the DGP^ and the EDMCP (see Sect. I2.6.2[) with K fixed: each 
DGPif instance G can be transformed in linear time to an EDMCP instance (and vice versa) by just 
considering the weighted adjacency matrix of G where vertex pairs {w, u} ^ E correspond to entries 
missing from the matrix. We shall call ^{G) the EDMCP instance corresponding to G and "^{A) the 
DGPif instance corresponding to an EDMCP instance A. 

As remarked in |174| . the completion in of a distance (sub)matrix D with the following structure: 
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can be carried out in constant time by solving a quadratic system in the unknown 5 derived from setting 
the Cayley-Menger determinant (Sect. [5]) of the distance space {X,d) to zero, where X = {xi, . . . ,a;5} 
and d is given by Eq. (jl3p . This is because the Cayley-Menger determinant is proportional to the volume 
of a 4-simplcx, which is the (unique, up to congruences) realization of the weighted 5-cliquc defined by a 
full distance matrix. Since a simplex on 5 points embedded in M?' necessarily has 4- volume equal to zero, 
it suffices to set the Cayley-Menger determinant of to zero to obtain a quadratic equation in 5. 

We denote the pair indexing the unknown distance 5 by &{D), the Cayley-Menger determinant 

of D by CM(D), and the corresponding quadratic equation in 5 by CM(D)((5) =0. If Z? is a distance 
matrix, then CM(£))((5) = has real solutions; furthermore, in this case it has two distinct solutions 5^ , 5^ 
with probability 1, as remarked in Sect. 13.31 These are two valid values for the missing distance c?i5. 
This observation extends to general K, where we consider a {K + l)-simplcx realization of a weighted 
near-clique (defined as a clique with a missing edge) on K + 2 vertices. 

3.3.7.1 BP in distance space We are given a DDGP instance with a graph G = (U, E) and a partial 
embedding x for the subgraph G[[iir]] of G induced by the set [K] of the first K vertices. The DDGP 
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order on V guarantees that the vertex of rank K +1 has K adjacent predecessors, hence it is adjacent to 
all the vertices of rank v G [K]. Thus, G[[K + 1]] is a full {K + l)-clique. Consider now the vertex of rank 
K + 2: again, the DDGP order guarantees that it has at least K adjacent predecessors. If it has K + 1, 
then G[[K + 2]] is the full {K + 2)-clique. Otherwise G[[K + 2]] is a near-clique on A' + 2 vertices with 
a missing edge {u,K + 2} for some m G [AT + 1]. We can therefore use the Cayley-Mengcr determinant 
(sec Eq. (fO)) for the special case K = 3, and Sect. [5] for the general case) to compute two possible values 
for Because the vertex order always guarantees at least K adjacent predecessors, this procedure 

can be generalized to vertices of any rank w in 1/ \ [K], and so it defines a recursive algorithm which: 

• branches whenever a distance can be assigned two different values; 

• simply continues to the next rank whenever the subgraph induced by the current K + 2 vertices is 
a full clique; 

• prunes all branches whenever the partial distance matrix defined on the current K + 2 vertices has 
no Euclidean completion. 

In general, this procedure holds for DDGP instances G whenever there is a vertex order such that 
each next vertex v is adjacent to K predecessors. This ensures G has a subgraph (containing v and K +1 
predecessors) consisting of two {K + 1) cliques whose intersection is a if-clique, i.e. a near-clique with 
one missing edge. There are in general two possible realizations in for such subgraphs, as shown in 
Fig.m 




Figure 8: On the left, a near clique on 5 vertices with one missing edge (dotted line). Center and right, 
its two possible realizations in (missing distance shown in red). 

Alg. [2] presents the dual BP. It takes as input a vertex v of rank greater than K + 1, a partial matrix 
A and a set which will eventually contain all the possible completions of the partial matrix given as 
the problem input. For a given partial matrix A, a vertex v of ^(A) and an integer £ < K, let A^ be 
the £ X i symmetric submatrix of A including row and column v that has fewest missing components. 
Whenever A^^"^ has no missing elements, the equation CM(A^5+^, (5) = is either a tautology if 
is a Euclidean distance matrix, or unsatisfiable in M otherwise. In the first case, wc define it to have 
5 ~ duv as a solution, where u is the smallest row/column index of A^^"^. In the second case, it has no 
solutions. 

Theorem 3.1 ( [137| ) At the end of Alg. [H eontains all possible completions of the input partial 
matrix. 

The similarity of Alg. [T] and [5] is such that it is very easy to assign dual meanings to the original 
(otherwise known as primal) BP algorithms. This duality stems from the fact that weighted graphs and 
partial symmetric matrices arc "dual" to each other through the inverse mappings ^ and Whereas 
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Algorithm 2 dBP(u, A, £/) 

Require: A vertex v ^ V \ [K + 1], a partial matrix A, a set £/. 



1 


P^{6\ CM(Af+2,(5) = 


2 


for (5 G P do 


3 




4 


duv ^ S 


5 


if A is complete then 


6 


£/ ^ £/U {A} 


7 


else 


8 


dBP(w + 1, A, £/) 


9 


end if 


10 


end for 



in the primal BP we decide realizations of the graph, in the dual BP we decide the completions of partial 
matrices, so realizations and distance matrix completions are dual to each other. The primal BP decides 
on points G to assign to the next vertex v, whereas the dual BP decides on distances 5 to assign 
to the next missing distance incident to v and to a predecessor of v; there are at most two choices of Xy 
as there are at most two choices for S; only one choice of Xy is available whenever v is adjacent to strictly 
more than K predecessor, and the same happens for 6; finally, no choices for Xy are available in case the 
current partial realization cannot be extended to a full realization of the graph, as well as no choices for 
S are available in case the current partial matrix cannot be completed to a Euclidean distance matrix. 
Thus, point vectors and distance values are dual to each other. The same vertex order can be used by 
both the primal and the dual BP (so the order is self-dual). 

There is one clear difference between primal and dual BP: namely, that the dual BP needs an initial 
{K + l)-cliquc, whereas the primal BP only needs an initial i^-clique. This difference also has a dual 
interpretation: a complete Euclidean distance matrix corresponds to two (rather than one) realizations, 
one being the reflection of the other through the hyperplane defined by the first K points (this is the 
"fourth level symmetry" referred to in jl22l Sect. 2.1] for the case K = 3). We remark that this difference 
is related to the reason why the exact SDP-based polynomial method for realizing uniquely localizable 
(see Sect. I3.2.4[) networks proposed in |201| needs the presence of at least K + 1 anchors. 



3.3.8 The Discretizable Moleculeir Distance Geometry Problem 

The DMDGP is a subset of instances of the DDGP3; its generalization to arbitrary K is called '^DMDGP. 
The difference between the DMDGP and the DDGP is that Uy is required to be the set of K immediate 
(rather than arbitrary) predecessors of v. So, for example, the discretization edges can also be expressed 
as Ed = {{u,v} & E \ \u — v\ < K} (see Sect. I3.3.6.ip . and x{Uy) = {xy-K, ■ ■ ■ ,Xy-i}. This restriction 
originates from the practically interesting case of realizing protein backbones with NMR data. 

Since such graphs are molecular (see Sect. I3.3.ip . they have vertex orders guaranteeing that each 
vertex > 3 is adjacent to two immediate predecessors, as shown in Fig. [31 The distance dy^y-2 is 




Figure 9: Vertex v is adjacent to its two immediate predecessors. 
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computed using the covalent bond lengths and the angle (w — 2, w — 1, w), which are known because of the 
rigid geometry hypothesis [77|. In general, this is only enough to guarantee discretizability ioT K = 2. 
By cxploting further protein properties, however, we were able to find a vertex order (different from the 
natural backbone order) that satisfies the DMDGP definition (see Sect. I3.5T^ . 

Requiring that all adjacent predecessors of v must be immediate provides sufficient structure to prove 
several results about the symmetry of the solution set X (Sect. I3.3.9P and about the fixed-parameter 
tractabililty of the BP algorithm (Alg. [T]) when solving '^DMDGPs on protein backbones with NMR 
data fSect. lHXTUl) . The DMDGP is NP-hard by reduction from Subset-Sum [Hg. The resuh can be 
generahzed to the ''DMDGP [TiD] . 



3.3.8.1 Mathematical programming formulation For completeness, and convenience of math- 
ematical programming versed readers, we provide here a MP formulation of the DMDGP. We model 
the choice between , xl by using torsion angles jl20j : these are the angles (j>v defined for each v > 3 
by the planes passing through Xy^3,Xv-2,Xy-i and Xy-2,Xv-i,Xv (Fig. [TU)) . More precisely, we sup- 
pose that the cosines Cy — cos{(j)y) of such angles are also part of the input. In fact, the values for 
c : y \ {1, 2, 3} — !> M can be computed using the DMDGP structure of the weighted graph in constant 
time using [901 Eq. (2.15)]. Conversely, if one is given precise values for the torsion angle cosines, then 
every quadruplet (a;„_3, Xv-2,Xv-i,Xy) must be a rigid framework (for v > 3). We let a : y \ {1, 2} — )■ R'^ 



i - 3 




Figure 10: The torsion angle 4>i. 

be the normal vector to the plane defined by three consecutive vertices: 

i j k 

VU > 3 ay = Xy-2,1 - Xy-l^l Xy -2 ,2 ~ X y - I ^2 X y -2 ,3 " Xy - I ^3 

{xv-2.2 - 2^1,-1, 2)(a:^i),3 - a;t,-i,3) - ixy-2,3 - Xy-\^z){xy^2 - Xy-\,2) 
{xv~2,\ - Xy-\,Y){xy,z - Xy-x^-i) " (21,-2,3 - 2:„_i,3)(x^,i - a;.„_i,i) 

{Xv-2S - Xy-\,{){Xy^2 - Xy-\^2) " (2t,_2,2 - Xy^x,2){Xv,l ~ Xy-\s) 

so that ay is expressed a function ay{x) of x and represented as a matrix with entries Xy-k. Now, for 
every w > 3, the cosine of the torsion angle is proportional to the scalar product of the normal vectors 
ay-\ and a^. 

Vw > 3 ay^\(x) ■ ay(x) = |jat,_i(a;)||||at,(x)|| cosi^i,. 
Thus, the following provides a MP formulation for the DMDGP: 



s.t. 



E {l^y-Xy\^-dlyf 

{u.v'}eE 

Vv > 3 ay-\{x) ■ ay(x) 



||a„_i(a;)||||a„(x)||c„. 



(14) 



We remark that generalizations of to arbitrary (fixed) K arc possible by using GraBmann-Pliickcr 
relations |2D] (also see [SUl Ch. 2]). 
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3.3.9 Symmetry of the solution set 

When we first experimented with the BP on the DMDGP, we observed that \X\ was always a power of two. 
An initial conjecture in this direction was quickly disproved by hand-crafting an instance with 54 solutions 
derived by the polynomial reduction of the Subset-Sum to the DMDGP used in the NP-hardness proof 
of the DMDGP |122| . Notwithstanding, all protein and protein-like instances wc tested yielded \X\ = 2^ 
for some integer £. Years later, we were able to prove that the conjecture holds on '^DMDGP instances 
with probability 1. and also derived an infinite (but countable) class of counterexamples [145j . Aside 
from explaining our conjecture arising from empirical evidence, our result is also important insofar as it 
provides the core of a theory of partial reflections for the '^DMDGP. References to partial reflections are 
occasionally found in the DGP literature [nU 1201] , but our group-theoretical treatment is an extensive 
addition to the current body of knowledge. 

In this section we give an exposition which is more compact and hopefully clearer than the one in 
[145j . We focus on '^DMDGP and therefore assume that Uy contains the K immediate predecessors of v 
for each v > K. We also assume G is a YES instance of the '^DMDGP, so that \P\ ~ 2 with probability 
1. 

3.3.9.1 The discretization group Let Gd = (V^, ^-Djrf) be the subgraph of G consisting of the 
discretization edges, and Xd be the set of realizations of Gd] since Gd has no pruning edges by definition, 
the BP search tree for Gd is a full binary tree and \Xd\ = 2"^^. The discretization edges arrange the 
realizations so that, at level I, there are 2^~^ possible positions for the vertex v with rank I. We assume 
that \P\ = 2 (sec Alg. [T]) at each level v of the BP tree, an event which, in absence of pruning edges, 
happens with probability 1. Let P = {a;^,a;^} be the two possible realizations of f at a certain recursive 
call of Alg. [T] at level v of the BP tree; then because P is an intersection of K spheres, is the reflection 
of through the hyperplane defined by x{Uv) = {xy-K, • ■ • , x^-i}. Wc denote this reflection operator 
by Rl. 

Theorem 3.2 (Cor. 4.6 and Thm. 4.9 in |l45j ) With probability 1, for all v > K and u < v — K 

there is a set H"^"" of 2""""-'^ real positive values such that for each x ^ X we have \\xy — £ i/"". 
Furthermore, \/x' G X, \\xy — — \\x'^ — a;„|| if and only if x[, G {xy, R'^^^ (x^)} . 

We sketch the proof in Fig. [11] for K ^ 2; the solid circles at levels 3,4,5 mark the locus of feasible 
realizations for vertices at rank 3,4,5 in the DMDGP order. The dashed circles represent the spheres 
S!^y (see Alg.[T|). Intuitively, two branches from level 1 to level 4 or 5 will have equal segment lengths but 
different angles between consecutive segments, which will cause the end nodes to be at different distances 
from the node at level 1. Observe that the number of solid circles at each level is a power of two where 
the exponent depends on the level index £, and each solid circle contains exactly two realizations (that 
are reflections of each other) of the same vertex at rank £. 

We now give a basic result on reflections in R^. For any nonzero vector y G K^^ let TZ{y) be the 
reflection operator through the hyperplane passing through the origin and normal to y. If y is normal to 
the hyperplane deflncd by Xy-K-, ■ ■ ■ , Xy-i-, then TZ^ = R^. 

Lemma 3.3 (Lemma 4.2 in [140] ) Let x ^ y E K^^ and z G R^^ such that z is not in the hyperplanes 
through the origin and normal to x,y. Then TZ{x)TZ{y)z — TZ{TZ{x)y)TZ{x)z. 

Thm. 13.31 provides a commutativity for reflections acting on points and hyperplanes. Fig. 1121 illustrates 
the proof for K = 2. 

For V > K and x G X we now deflnc partial reflection operators: 

gy{x) = {xi, . . . ,Xy-i, Rl{xy), . . . , Rl{xn)). (15) 
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f 1 




Figure 11: A pruning edge {1,4} prunes either !/6,J^7 or 1^5, t's- 




Figure 12: Reflecting through TZ{y) first and TZ{x) later is equivalent to reflecting through TZ{x) first and 
the reflection of TZ{y) through TZ{x) later. 

The (7t,'s map a realization x to its partial reflection with first branch at v. It is easy to show that the 
(7i,'s are injective with probability 1 and idempotent. 

Lemma 3.4 (Lemma 4.3 in [140]) For x £ X and u,v £V such that u,v > K, gugv{x) = gyQuix). 

We define the discretization group to be the symmetry group Qd = (.9i. | v > K) generated by the 
partial reflection operators g^,. 

Corollary 3.5 With probability 1, Qjj is an Abclian group isomorphic to Cj"^ (the Cartesian product 
consisting of n ~ K copies of the cyclic group of order 2). 
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For all V > K let = (1, . . . , 1, —ly, . . . , —1) be the vector consisting of one's in the first v — 1 components 
and —1 in the last components. Then the actions are naturally mapped onto the chirality functions. 

Lemma 3.6 (Lemma 4.5 in [140]) For all x ^ X , x{9v{x)) = x(a;) o where o is the Hadamard 
product. 

This follows by definition of and of chirality of a realization. Since, by Alg. [TJ each x G X has a 
different chirality, for all x,x' G X there is g G Gd such that x' ~ g{x), i.e. the action of Gd on X is 
transitive. By Thm. 13.21 the distances associated to the discretization edges are invariant with respect 
to the discretization group. 

3.3.9.2 The pruning group Consider a pruning edge {u, v} E Ep. By Thm. [321 with probability 1 
we have duv G i?"", otherwise G cannot be a YES instance (against the initial assumption). Also, again 
by Thm. 13.21 duv = \\xu — x^W 7^ ||(7^(x)„ — 5'u;(a:^)u|| for all w S {u + K + 1, . . . ,v} (e.g. the distance 
— i/gll in Fig. [TT]is different from all its reflections — i/hW, with h G {10, 11, 12}, w.r.t. 54,35). We 
therefore define the pruning group 

Gp = {gw \ w> K A V{u, v} eEp {w ^ {u + K + 1,..., v})). 

By definition, Gp Gd and the distances associated with the pruning edges are invariant with respect 
to Gp. 

Theorem 3.7 (Thm. 4.6 in [145]) The action oj Gp on X is transitive with probability 1. 
Theorem 3.8 (Thm. 4.7 in [140] ') With probability 3£ G N |X| = 2^. 

Proof. The argument below holds with probability 1. Since Gd — Cj"^^, \Gd\ ~ 2"^"^ . Since Gp < Gd, 
\Gp\ divides the order of \Gd\, which implies that there is an integer £ with \Gp\ — 2^. By Thm. [3771 the 
action of Gp on X only has one orbit, i.e. Gpx = X for any x X . By idcmpotency, for g,g' G Gp, if 
gx = g'x then g = g' . This implies \Gpx\ = \Gp\- Thus, for any x G X, \X\ = \Gpx\ = \Gp\ = 2^. □ 



3.3.9.3 Practical exploitation of symmetry These results naturally find a practical application 
to speed up the BP algorithm. The BP proceeds until a first valid realization is identified. It can be 
shown that, at that point, a set of generators for the group Gp are known. These are used to generate all 
other valid realizations of the input graph, up to rotations and translations [1571 1158[ . Empirically, this 
cuts the CPU time to roughly 2/|X| (the factor 2 is due to the fact that the original BP already takes 
one reflection symmetry into account, see |1221 Thm. 2]). 

3.3.10 Fixed parameter tractability 

As the theory of partial reflections, the proof that the BP is Fixed-Parameter Tractable (FPT) on proteins 
also stems from empirical evidence. All the CPU time plots versus instance size for the BP algorithm on 
protein backbones look roughly linear, suggesting that perhaps such instances are a "polynomial case" of 
the DMDGP. The results that follow provide sufficient conditions for this to be the case. We were able 
to verify empirically that PDB proteins conform to these conditions. These results are a consequence of 
the theory in Sect. 13.3.91 insofar as they rely on an exact count of the BP tree nodes at each level. We 
formalize this in a DAG P„„ that represents the number of valid BP search tree nodes in function of 
pruning edges between two vertices u,v £ V such that v > K and u < v — K (see Fig. 1131) . The first 
row in Fig. [T31 shows different values for the rank of v w.r.t. w; an arc labelled with an integer i implies 
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Figure 13: Number of valid BP nodes (vertex label) at level u + K + £ (column) in function of the pruning 
edges (path spanning all columns). 



the existence of a pruning edge {u + i,v} (arcs with V-cxpressions replace parallel arcs with different 
labels). An arc is unlabelled if there is no pruning edge {w^v} for any w £ {w, . . .,v ^ K — 1}. The 
vertices of the DAG are arranged vertically by BP search tree level, and are labelled with the number of 
BP nodes at a given level, which is always a power of two by Thm. 13.81 A path in this DAG represents 
the set of pruning edges between u and v, and its incident vertices show the number of valid nodes at the 
corresponding levels. For example, following unlabelled arcs corresponds to no pruning edge between u 
and V and leads to a full binary BP search tree with 2""^ nodes at level v. 

For a given Gd-, each possible pruning edge set Ep corresponds to a path spanning all columns in 
Vin- Instances with diagonal fProo. l3J|) or below-diagonal (Prop. [3?T0| Ep paths yield BP trees whose 
width is bounded by 0(2"°) where vq is small w.r.t. n. 

Proposition 3.9 (Prop. 5.1 in [140]) If 3vo > K s.t. Wv > vq 3u < v — K with {u,v} e Ep then the 
BP search tree width is hounded by 2'"°~^ . 

This corresponds to a path pg = (1,2,..., 2'"°~^^ , . . . , 2'"°~^^) that follows unlabelled arcs up to level vq 
and then arcs labelled vq — K — 1, vq — K — IVvq — K, and so on, leading to nodes that are all labelled 
with 2^'«"-^ (Fig.m top). 

Proposition 3.10 (Prop. 5.2 in [140]) // 3wo > K such that every subsequence s of consecutive ver- 
tices > Vq with no incident pruning edge is preceded by a vertex Vs such that 3us < Vs {vg — Us > 
\s\ A {us,Vs} e Ep), then the BP search tree width is bounded by 2'"°~^ . 

This situation corresponds to a below-diagonal path (Fig. [T31 bottom). In general, for those instances 
for which the BP search tree width has a 0(2"" logn) bound, the BP has a worst-case running time 
0(2''''L2'°s") = 0{Ln), where L is the complexity of computing T. Since L is typically constant in n 
[St) . for such cases the BP runs in time 0(2""n). Let V = {v G V \ 3i & N {v = 2^)}. 
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Figure 14: A path pg yielding treewidth 4 (top) and another path below Pq (bottom). 

Proposition 3.11 (Prop. 5.3 in |140| ) If3vo > K s.t. for allv G V'^V with v > vq there isu < v—K 
with {u,v} € Ep then the BP search tree width at level n is bounded by 2"°n. 

This corresponds to a path roughly along the diagonal apart from logarithmically many vertices in V 
(those in V'), at which levels the BP doubles the number of search nodes (Fig. [T5|) . For a pruning edge 
set Ep as in Prop. [XTTl or yielding a path below it, the BP runs in 0{2'"''n'^). 

3.3.10.1 Empirical verification On a set of 45 protein instances from the Protein Data Bank 
(PDB), 40 satisfy Prop.EiH and 5 satisfy Prop. EZHl all with = 4 [TiO] . This is consistent with the 
computational insight |122j that BP empirically displays a polynomial (specifically, linear) complexity on 
real proteins. 



3.4 Interval data 

In this section we discuss methods that target an MDGP variant, called iMDGP, which is closer to the 
real NMR data: edges {u, v} £ E are weighted with real intervals duv = d^^] instead of real values. 
These intervals occur in practice because, as all other physical experiments, NMR outputs data with some 
uncertainty, which can be modelled using intervals. The iMDGP therefore consists in finding x G 
that satisfies the following set of nonlinear inequalities: 

y{u,v}eE dt<\\xu-x4<d^,. (16) 
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Figure 15: A path yielding treewidtli 0{n). 



The MP formulation ([7]) can be adapted to deal with this situation in a number of ways, such as, e.g.: 

min ^ (max(d^^ - - x„||,0) + max(||x„ - a;i,|| - (i^„,0)), (17) 

{u,v}eE 

min V (max((d^j2 _ 11^^^ _^^^||2^0) +max(||x„ -.T„|p - (d^j2^0), (18) 

{u,v}eE 

min y {msix^iid^j^ ~\\xu~x,W',0)+max^{\\xu-x,\\^ -{d^,)^,0)). (19) 

{«,i>}e_E 

Problem (|19p is often appropriately modified to avoid bad scaling (which occurs whenever the observed 
distances differ in the order of magnitude): 



min (max^ ( 

{u,«}e_E 



,0) + max^(- 



-^,0)). 



(20) 



3.4.1 Smoothing-based methods 



Several smoothing-based methods (e.g. DGSOL and DCA, see Sect. 13. have been trivially adapted to 
solve (dH) and/or (^0]) . 



3.4.1.1 Hyperbolic smoothing The hyperbolic smoothing described in |202| is specifically suited to 
the shape of each summand in (|17p , as shown in Fig. [ini The actual solution algorithm is very close to the 
one employed by DGSOL (see Sect. 13. 2T^ . Given the fact that the smoothing is not "general-purpose" 
(as the Gaussian transform is), but is specific to the problem at hand, the computational results improve. 
It should be noted, however, that this approach gives best results for near cubic grid arrangements. 
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Figure 16: The function niax(x, 0) and its hyperbolic smoothing F{x, A). 

3.4.2 The EMBED algorithm 

The EMBED algorithm, proposed by Crippen and Havel }50j . first completes the missing bounds and 
refines the given bounds using triangle and tetrangle inequalities. Then, a trial distance matrix D' is 
randomly generated, and a solution is sought using a matrix decomposition method [28] . Since the 
distance matrix D' is not necessarily Euclidean [ST] , the solution may not satisfy ([TS]) . If this is the case, 
the final step of the algorithm is to minimize the distance violations using the previous solution as the 
initial guess. More details can be found in |219l[55] . 

3.4.3 Monotonic Basin Hopping 

A Monotonic Basin Hopping (MBH) algorithm for solving p^ - ipU]) is employed in [53]. Let ^ be 
the set of local optima of (dj) and ^ : ^ ^(M^) (where I3^{S) denotes the power set of 5") be 
some appropriate neighbourhood structure. A artial order □ on ^ is assumed to exist: x Z\ y implies 
y S ,y\^{x) and f{x) > f{y). A funnel is a subset ^ C ^ such that for each x G there exists a chain 
x = □ □ • ■ • □ X* = min^ (the situation is described in Fig. [T7| . The MBH algorithm is as 
follows. Starting with a current solution x G J^, sample a new point x' £ ,y/{x) and use it as the starting 
point for a local NLP solver; repeating this sufficiently many times will yield the next optimum x^ in the 
funnel. This is repeated until improvements are no longer possible. The MBH is also employed within a 
population- based metaheuristic called Population Basin Hopping (PBH), which explores several funnels 
in parallel. 

3.4.4 Alternating Projections Algorithm 

The Alternating Projection Algorithm (APA) |177| is an application of the more general Successive 
Projection Methodology (SPM) [SHI 1211] to the zMDGP. The SPM takes a starting point and projects it 
alternately on the two convex sets, attempting to reach a point in their intersection (Fig. I18|). 

In the APA, the starting point is a given pre-distance matrix D = {5uv)i i-e. an n x n symmetric matrix 
with non-negative components and zero diagonal. D is generated randomly so that d^^ < Suv ^ d^^y for all 
{u, v} G E and dm, = otherwise. By Schoenberg's Theorem [221 and Eq. ([5]), if we let P = 7-^11^ and 
A = —^PDP, where / is the nxn identity matrix and 1 is the all-one n- vector. D is a. Euclidean distance 
matrix if and only if A is positive semi-definite. Notice that P is the orthogonal projection operator on 
the subspace M ~ {x G M" | x^l = 0} of vectors orthogonal to 1, so Z? is a Euclidean distance matrix 
if and only if D is negative semidefinite on M |79j . On the other hand, a necessary condition for any 
matrix to be a Euclidean distance matrix is that it should have zero diagonal. This identifies the two 
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Figure 17: The dashed horizontal hnes indicate the extent of the neighbourhoods. The set ,^ = {x, x^,x*} 
is a funnel, because x Zi Zi x* = min^. The set {x*,y} is not a funnel, as y ^ yi^{x*). 




Figure 18: The SPM attempts to find a point in the intersection of two convex sets. 

convex sets on which the SPM is run: the set V of matrices which are negative semidefinite on M , and 
the set Z of zero-diagonal matrices. The projection operator for V is Q{D) = PUA~UP, where UAU is 
the spectral decomposition of D and A~ is the nonpositive part of A, and the projection operator for Z 
is Q'Id) = D-dmg{D). 

Although the convergence proofs for the SPM assumes an infinite number of iterations in the worst 
case, empirical tests suggest that five iterations of the APA are enough to get satisfactory results. The 
APA was tested on the bovine pancreatic trypsin inhibitor protein (qlq) , which has 588 atoms including 
side- chains. 

3.4.5 The GNOMAD iterative method 

The GNOMAD algorithm |225| (see Alg. is a multi-level iterative method, which tries to arrange 
groups of atoms at the highest level, then determines an appropriate order within each group using the 
contribution of each atom to the total error, and finally, at the lowest level, performs a set of atom 
moves within each group in the prescribed order. The method exploits several local NLP searches (in 
low dimension) at each iteration, as detailed below. The constraints exploited in Step [7] are mostly given 
by van der Waals distances |189) . which are physically inviolable separation distances between atoms. 
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Algorithm 3 GNOMAD 
1: {Ci, . . . , Ce} is a vertex cover for V 
2: for i e {1, ...,£} do 

3: while termination condition not met do 
4: determine an order < on Ci 
5: for V e {Ci, <) do 

6: find search direction A^, for Xi, (obtained by solving an NLP locally) 

7: determine step Sy minimizing constraint infeasibility 

9: end for 
10: end while 
11: end for 



3.4.6 Sthochastic Proximity Embedding heuristic 

The basic idea of the Stochastic Proximity Embedding (SPE) |230j heuristic is as follows. All the atoms 
are initially placed randomly into a cube of a given size. Pairs of atoms in E are repeatedly and randomly 
selected; for each pair {u, v}, the algorithm checks satisfaction of the corresponding constraint in ([T5)) . If 
the constraint is violated, the positions of the two atoms are changed according to explicit formulae in 
order to improve the current embedding (two examples are shown in Fig. I19[). 



d u d V 

A A 

Figure 19: Local changes to positions according to discrepancy with respect to the corresponding distance. 



Algorithm 4 SPE Heuristic 

while termination condition not met do 
Pick {u,v} e E {\\xu - a;„|| ^ duv) 
Update A 

Let 
end while 



The SPE heuristic is shown in Alg. |4l SPE offers no guarantee to obtain a solution satisfying all 
constraints in , however the "success stories" reported in [57] seem to indicate this as a valid method- 
ology. 

3.5 NMR data 

Nuclear Magnetic Resonance experiments are performed in order to estimate distances between some 
pairs of atoms forming a given molecule [228]. In solution, the molecule is subjected to a strong external 
magnetic field, which induces the alignment of the spin magnetic moment of the observed nuclei. The 
analysis of this process allows the identification of a subset of distances for certain pairs of atoms, mostly 
those involving hydrogens, as explained in the introduction (p. 2]). In proteins, nuclei of carbons and 
nitrogens are also sometimes considered. 

It is important to remark that some NMR signals may fail to be precise, because it is not always 
possible to distinguish between the atoms of the molecule. We can have this situation, for example, in 
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proteins containing amino acids such as valines and leucines. In such a case, the distance restraints (a 
term used in proteomics meaning "constraints" ) involve a "pseudo-atom" that is placed halfway between 
the two undistinguished atoms [229] . Once the upper bound for the distance has been chosen when 
considering the pseudo-atom, its value is successively increased in order to obtain an upper bound for 
the real atoms. 

There are also other potential sources of errors that can affect NMR data. If the molecule is not 
stable in solution, its conformation may change during the NMR experiments, and therefore the obtained 
information could be inconsistent. Depending on the machine and on the magnetic field, some noise 
may spoil the quality of the NMR signals from which the intervals are derived. Moreover, due to a 
phenomenon called "spin diffusion", the NMR signals related to two atoms could also be influenced by 
neighboring atoms [H]. 

Fortunately, for molecules having a known chemical composition, such as proteins, there are a priori 
known distances that can be considered together with the ones obtained through NMR experiments. 
If two atoms are chemically bonded, their relative distance is known; this distance is subject to small 
variations, but it can still be considered as fixed in several applications (see the rigid geometry hypothesis, 
Sect. l3.3.T|) . Moreover, the distance between two atoms bonded to a common atom can also be estimated, 
because they generally form a specific angle that depends upon the kind of involved atoms. Such distances 
can therefore be considered precise, and provide valuable information for the solution of distance geometry 
problems (this follows because protein graphs are molecular, see Sect. I3.3TT|) . 

As explained in the introduction, on p. |4j the output of a Nuclear Magnetic Resonance experiment 
on a given molecule can be taken to consist of a set of triplets {{a,b},d,q), meaning that q pairs of 
atoms of type a,b were observed to have distance d [T7|. It turns out that NMR data can be further 
manipulated so that it yields a list of pairs {u, v} of atoms with a corresponding nonnegative distance c?„„. 
Unfortunately this manipulation is rather error-prone, resulting in interval-type errors, so that the exact 
inter-atomic distances duv are in fact contained in given intervals [d^^,d^y] [17]. For practical reasons, 
NMR experiments are most often performed on hydrogen atoms |17| (although sometimes carbons and 
nitrogens are also considered) . Other known molecular information includes [1891 162] : the number and 
type of atoms in the molecules, all the covalent bonds with corresponding Euclidean distances, and all 
distances between atoms separated by exactly two covalent bonds. 

3.5.1 Virtual backbones of hydrogens 

In order to address the NMR limitation concerning the lack of data reliability for inter-atomic distances of 
non-hydrogen atoms, we define atomic orders limited to hydrogens, and disregard the natural backbone 
order during discretization. Even though we showed that this approach works on a set of artificially 
generated instances jl29| , we remarked its limitations when we tried to apply it to real NMR data. These 
limitations have been addressed by using re-orders (see Sect. I3.5.2|) . 

3.5.2 Re-orders and interval discretization 

In jl23j we define an atomic ordering which ensures that every atom of rank > 3 is adjacent to its three 
immediate predecessors by means of either real- valued distances d, or interval distances d that arise from 
geometrical considerations rather than NMR experiments. Specifically, with reference to Fig. [TOj the 
distance di-^^i belongs to a range determined by the uncertainty associated with the torsion angle (fii. 

We exploited three protein features to this aim: (i) using hydrogen atoms off the main backbone 
whenever appropriate, (ii) using the same atom more than once, (iii) remarking that interval distances d 
can be replaced with finite (small) sets D of real- valued distances. Considering these properties, we were 
able to define a new atomic ordering for which v can be placed in a finite number of positions in the set 
{0, 1,2, 2IDI}, consistently with the known positions of the three immediate predecessors of v. Feature 
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(i) allows us to exploit atoms for which NMR data are available. Feature (ii) allows us to exploit more 
than just two bond lengths on atoms with valence > 2, such as carbons and nitrogens, by defining an 
order that includes the atom more than once. Since atoms are repeated in the order, we call these orders 
re-orders |123) . Feature (iii) rests on an observation concerning the resolution scope of NMR experimental 
techniques |170) . Fig. 1201 shows a rc-ordcr for a small protein backbone containing 3 amino acids. 




Figure 20: The order used for discretizing MDGPs with interval data. 



Re-orders (wi, . . . , Vp) deserve a further remark. We stressed the importance of strict simplex inequal- 
ities in Sect. 13.3751 but requiring that Vi = vj for some i ^ j introduces a zero distance d{vi,Vj) = 0. If 
this distance is ever used inappropriately, we might end up with a triangle with a side of zero length, 
which might in turn imply an infinity of possible positions for the next atom. We recall that, for any 
V > K, strict simplex inequalities /^k-i{Uv) > in dimension K — 1 are necessary to discretization, as 
they avoid unwanted aflane dependencies (see e.g. Fig.[S]). By contrast, if Aif (C/„ U{w}) > hold, then we 
have a iiT-simplex with nonzero volume, which has two possible orientations in R^: in other words, the 
two possible positions for Xy are distinct. If Ak{Uv U {v}) ~ 0, however, then there is just one possible 
position for Xy. Thus, to preserve discretization, zero distances can never occur between pairs Vi, vj fewer 
than K atoms apart, but they may occur for \i — j| = K: in this case we shall have no branching at level 
max(z, j). 

Re-orders make it possible to only employ non-NMR distances for discretization. More precisely, over 
each set of three adjacent predecessors, only one is related by an interval distance; this interval, however, 
is not due to experimental imprecision in NMR, but rather to a molecular property of torsion angles. In 
particular, we can compute tight lower and upper bounds to these intervals; consequently, they can be 
discretized without loss of precision |123j . Wc refer to such intervals as discretizable. 



3.5.3 Discrete search with interval distances 



The interval BP (iBP) [123j is an extension of the BP algorithm which is able to manage interval 
data. The main idea is to replace, in the sphere intersections necessary for computing candidate atomic 
positions, a sphere by a spherical shell. Given a center c G M.^ and an interval d = [d^, d^] the spherical 
shell centered at c w.r.t. d is S^~^{c,d'-') \ S^~^{c,d^). With K = 3, the intersection of two spheres 
and a spherical shell gives, with probability one, two disjoint curves in three-dimensional space (Fig. [21]). 
The discretization is still possible if some sample distances arc chosen from the interval associated to the 
curves [T7D) . 

Similarly to the basic BP algorithm, the two main components of iBP are the branching and the 
pruning phases. In the branching phase, we can have 3 different situations, depending on the distance 
d{i — 3,i) (see Fig. [20]). If d{i — 3,i) = 0, the current atom i already appeared previously in the order, 
which means that the only feasible position for i is the same as z — 3. If d{i — 3,i) is a precise distance. 
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Figure 21: The intersection of two spheres with a spherical shell. 

then 3 spheres are intersected, and only two positions are found with probability one. Finally, if d{i — 3, i) 
is a discretizable interval [df_3 i,dy_^ J, as specified in Sect. I3.5T21 we choose D values from the interval. 
This yields a choice oi 2D candidate atomic solutions for i. 

If the discretization order in Fig.[20]is employed for solving NMR instances, (precise) distances derived 
from the chemical composition of proteins are used for performing the discretization, whereas interval 
distances from NMR experiments arc used for pruning purposes only. The consequent search tree is no 
longer binary: every time a discretizable interval is used for branching, the current node has at most 2D 
subnodcs. The advantage is that the generation of the search tree is not affected by experimental errors 
caused by the NMR machinery. 

In order to discretize instances related to entire protein conformations, it is necessary to identify 
a discretization order for all side chains for the 20 amino acids that can be involved in the protein 
synthesis. This is a nontrivial task, because side chains have more complex structures with respect to 
the part which is common to each amino acid, and they may contain many atoms. However, side chains 
can be of fundamental importance in the identification of protein conformations, because many distances 
obtained by NMR experiments may regard hydrogen atoms contained in side chains. First efforts towards 
extending the BP algorithm so that it can calculate the whole three-dimensional structure of a protein, 
including its side chains, can be found in [183| . 



4 Engineering applications 

In this section, we discuss other well-known applications of distance geometry: wireless networks, statics, 
data visualization and robotics. In wireless networks, mobile sensors can usually estimate their pairwise 
distance by measure how much battery they use in order to communicate. These distances are then 
used to find the positions of each sensor (see Sect. 14. ip . Statics is the field of study of the equilibrium 
of rigid structures (mostly man-made, such as buildings or bridges) under the action of external forces. 
A well-known model for such structures is the bar-and-joint framework, which is essentially a weighted 
graph. The main problem is that of deciding whether a given graph, with a given distance function on 
the edges, is rigid or not. An associated problem is that of deciding whether a given graph models a rigid 
structure independently of the distance function (see Sect. 14. 2| ). 
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4.1 Wireless sensor networks 

The position of wireless mobile sensors (e.g. smartphones, identification badges and so on) is, by its very 
definition, local to the sensor carrier at any given time. Notwithstanding, in order to be able to properly 
route communication signals, the network routers must be aware of the sensor positions, and adapt routes, 
frequencies, and network ID data accordingly. The information available to solve this problem is given 
by the fact that mobile sensors are always aware of their neighbouring peers (to within a certain radius r 
from their positions, which wc shall assume constant), as well as of the amount of battery charge they use 
in order to communicate with each other sensor in their neighbourhood. It turns out that this quantity 
is strongly correlated with the Euclidean distance between the communicating sensors |187j . Moreover, 
certain network elements, such as routers and wireless repeaters, are fixed, hence their positions are 
known (such elements are called anchors or beacons). The problem of determining the sensor positions 
using these data was deemed as an important one from the very inception of wireless networks |222[ . 
There are several good reasons why Global Positioning System (GPS) enabled devices may not be a valid 
alternative: they are usually too large, they consume too much power, and they need a line of sight with 
the satellites, which may not always be the case in practice (think for example of localizing sensors within 
a building) |187j . This problem is formalized as the WSNL (see Item [U in the list of Sect. II. 2| ). 

In practice, K € {2, 3}. The 3D case might occur when a single network is spread over several floors of 
a building, or whenever a mobile battlefield network is parachuted over a mountainous region. Moreover, 
because the realization represents a practically existing network, an important question is to determine 
what amount of data suffices for the graph to have a unique realization in R^. This marks a striking 
difference with the application of DG techniques to molecular conformation, where molecules can exist 
in different isomers. 

The earliest connections of WSNL with DG arc an SDP formulation [SD] for a relaxation of the 
problem where the Euclidean distance between two sensors is at most the corresponding edge weight, and 
an in-depth theoretical study of the WSNL from the point of view of graph rigidity [51] (see Sect. I4.2|l . 

4.1.1 Unique realizability 

In |69l in] , the WSNL is defined to be solvable if the given graph has a unique valid realization, a notion 
which is also known as global rigidity. A graph is globally rigid if it has a generic realization x, and 
for all other realizations x', a; is congruent to x' . For example, if a graph has a X-trilatcration order, 
then it is globally rigid: comparing with DVOP orders, where each vertex is adjacent to K predecessors, 
the additional adjacency makes it possible to identify at most one position in M.^ where the next vertex 
in the order will be placed, if a position for all predecessors is already known. Any graph possessing a 
A'-trilateration order is called a K -trilateration graph. Such graphs are globally rigid, and can be realized 
in polynomial time by simply remarking that the BP would never branch on such instances. 

A graph G ~ (V, E) is redundantly rigid if (V, E \ {e}) is rigid for all e & E. It was shown in [55J 25] 
that G is globally rigid for K = 2\i and only if either G is the 2-clique or 3-clique, or G is 3-connected and 
redundantly rigid. Hendrickson had conjectured in [91] that these conditions would be sufficient for any 
value of K, but this was disproved by Connelly [3J. He also proved, in [JS], that if a generic framework 
(G, x) has a self-stress (see Sect. 14.2.11) lo : E ^M. such that the n x n stress matrix, with (u, ?;)-th entry 
(—uJuv) if {u, v} € E, J2t^s{v) '^ut if M = u, and otherwise, has rank n — A' — 1, then (G, x) is globally 
rigid in any dimension K. Some graph properties ensuring global rigidity for K G {2, 3} are given in 
[5]. A related problem, that of choosing a given subset of vertices to take the role of anchors, such that 
the resulting sensor network is uniquely localizable (see Sect. I3.2T4)) . is discussed in [72]. Several results 
on global rigidity (with particular attention to the case K ~ 2) arc surveyed in [lOOj . In particular, it 
is shown in |1001 Thm. 11.3] that Henneberg type II steps (replace an edge {u,w] by two edges 
and where u is a new vertex, then add new edges from v to K — 1 other vertices different from 

w, w) are related to global rigidity in a similar way as Henneberg type I steps (see Sect. I4.2.3P are related 
to rigidity: if a globally rigid graph H is derived from a graph G with at least A' -t- 2 vertices using a 
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Hcnneberg type II step in R , then G is also globally rigid. 

There is an interesting variant of unique localizability which yields a subclass of DGP instances that 
can be realized in polynomial time. Recall that the DGP is strongly NP-hard [188] in general. Moreover, 
it remains NP-hard even when the input is a unit disk graph (Sect. I4.1.4|) [9], and there exists no 
randomized efficient algorithm even when it is known that the input graph is globally rigid |10j . The 
problem becomes tractable under the equivalent assumptions of /i'-unique localizability (a sort of unique 
localizability for fixed K) |201j and universal rigidity |235] (see Sect. I3.2.4p . Specifically, a graph is 
K -uniquely localizable if: (i) it has a unique realization x : V -i' M.^ , (ii) it has a unique realization 

: V ^ for all £ > K, and (iii) for a\\ v G V,i > K we have yf, = (a;„,0), where is the zero 
vector in M.^~^ . Anchors play a crucial role in ensuring that the graph should be globally rigid in R^: 
the subgraph induced by the anchors should yield a generic globally rigid framework in R^, thus the set 
of anchors must have at least K + 1 elements. Under these assumptions, an exact polynomial algorithm 
(exploiting the SDP formulation and its dual) for realizing /iT-uniquely localizable graphs was described 

in fm\ . 

4.1.2 Semidefinite Programming 

Most of the recent methods addressing the WNSL make use of SDP techniques. This is understandable 
in view of the relationship between DG and SDP via Thm. 12. 2[ and because PSD completion is actually 
a special case of the general SDP feasibility problem (see Sect. I2.6.ip . We also mention that most SDP 
methods can target DGP problem variants where the edge weight d maps into bounded intervals, not 
only reals, and are therefore suitable for applications where distance measurements are not precise. 

We believe [101] is the first reference in the literature that proposes an SDP-based method for solving 
MCPs (specifically, the PSDMCP). In 0, the same approach is adapted to a slightly different EDMCP 
formulation. Instead of a partial matrix, an n x n pre-distance matrix A is given, i.e. a matrix with zero 
diagonal and nonnegative off-diagonal elements. We look for an n x n Euclidean distance matrix D that 
minimizes \\H o (^A — D)\\p, where _ff is a given matrix of weights, o is the Hadamard product, and || • \\f is 

the Frobenius norm (1|Q||f = ^J^ij<n ifj)- optional linear constraint can be used to fix some of the 
values of D. A reformulation of the constraint "Z? is a Euclidean distance matrix" to AT )^ 0, is derived 
by means of the statement that D is a Euclidean distance matrix if and only if D is negative semidefinite 
on the orthogonal complement of the all-one vector [STl I177j (see Sect. I3.4T4)) . In turn, this is related to 
Thm. O 

In |32l l60] , interestingly, the connection with SDP is not given by Thm. 12.21 but rather because the 
WSNL variants mentioned in the paper make use of convex norm constraints which are reformulated 
using Linear Matrix Inequalities (LMI). For example, if there is a direct communication link between two 
nodes u,v V , then — Xy \\ < r, where r is a scalar threshold given by the maximum communication 
range, the inequality can be reformulated to the following LMI: 

/ rl2 Xu - Xy 

where Ik is the K x K identity matrix (with K = 2). 

Biswas and Ye proposed in [25] an SDP formulation of the WSNL problem which then gave rise to a 
series of papers [HJ [521 [121 [101 [131 [H] focusing on algorithmic exploitations of their formulation. In the 
spirit of jl33] . this can be derived from the "classic" WSNL feasibility formulation below by means of a 
sequence of basic reformulations: 

y{u,v} e E (||a;„ - a:i,||2 = d™) 
VueA^v^A {{u,v} e E ^ \\au ~ Xy\\ = d™), 

where A C V is the set of anchors whose positions {a„ | m G A} C are known a priori. Let X be the 
K X n decision variable matrix whose v-th column is Xy. The authors remark that: 
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• for all u < V £ V, ||.t„ — x^W^ ~ e„.u^X^Xe„„, where Cuv = 1 at component u, —1 at component 
V, and elsewhere; 

• for all u & A,v G V, \\au — x^W^ = (a„; e„)^[/if ; ; X](a„; Ci,), where (a„;et,) is the cohimn 
(K + n)vectoT consisting of a„ on top of with ey = 1 at component v and elsewhere, and 
[Ik', X] is the K x {K + n) matrix consisting of Ik followed by X; 

• [Ik] X]^ [Ik] "'^1 ^ X ) ' ^ ^^'^ ^ ^ ^ matrix denoted by Z; 

• the scalar products of decision variable vectors in X^ X (rows of X^ by columns of X) can be 
linearized, replacing each XuXv by Uuv, which results in substituting X^ X by an n x n matrix 
y = iVuv) such that Y = X^X. 



This yields the following formulation of the WSNL: 

V{u, v} e E Cuv^YCuv 

\tu e A,v ^ A ({w, v} e E ^ (a„; e„)^Z(a„; Cy) 

Y = X^X. 

The SDP relaxation of the constraint Y = X^ X , which is equivalent to requiring that Y has rank K, 
consists in replacing it with Y — X^ X > 0, which is equivalent to Z ^ 0. The whole SDP can be written 
in function of the indeterminate matrix Z as follows, using Matlab-like notation to indicate submatrices: 

= Ik (21) 

= dl,) (22) 

= dl,) (23) 

h 0, (24) 

where • is the Frobenius product. This formulation was exploited algorithmically in a number of ways. 
As mentioned in Sect. 14.1.11 and 13.2.41 solving the SDP formulation ((2T|) - ((24| yields a polynomial-time 
algorithm for the DGP on uniquely localizable graphs (see Sect. I3.2.4[) . The proof uses the dual SDP 
formulation to (PT|) - ([M)) in order to show that the interior point method for SDP yields an exact solution 
[2011 Cor. 1] and the fact that the SDP solution on uniquely localizable graphs has rank K |2011 Thm. 2]. 
Another interesting research direction employing (PT|) - (P^ is the edge-based SDP (ESDP) relaxation 
[22 Ij : this consists in relaxing ([M|) to only hold on principal submatrices of Z indexed by A. To address 
the fact that SDP and ESDP formulations are very sensitive to noisy data, a robust version of the ESDP 
relaxation was discussed in |173| (see Sect. 13.2.^ . 

Among the methods based on formulation ((2T|) - ([24| . [23j[24] are particularly interesting. They address 
the limited scaling capabilities of SDP solution techniques by identifying vertex clusters where embedding 
is easier, and then match those cmbcddings in space using a modified SDP formulation. The vertex 
clusters cover V in such a way that neighbouring clusters share some vertices (these arc used to "stitch 
together" the embeddings restricted to each cluster). The clustering technique is based on permuting 
columns of the distance matrix {dij) so as to try to pool the nonzeros along the main diagonal. The 
partial embeddings for each cluster arc computed by first solving an SDP relaxation of the quadratic 
system (jl6p restricted to edges in the cluster, and then applying a local NLP optimization algorithm 
that uses the optimal SDP solution as a starting point. When the distances have errors, there may not 
exist any valid embedding satisfying all the distance constraints. In this case, it is likely that the SDP 
approach (which relaxes these constraints anyhow) will end up yielding an embedding x' which is valid 
in a higher dimensional space where K' > K. In such cases, x' is projected onto an embedding 
X in R^. Such projected embeddings usually exhibit clusters of close vertices (none of which satisfies 
the corresponding distance constraints), due to correct distances in the higher dimensional space being 
"squeezed" to their orthogonal projection into the lower dimensional space. In order to counter this type 
of behaviour, a regularization objective max^^ -^y \ [xi — Xj[\'^ is added to the feasibility SDP. 




Zl:KS:K 

Vm, u e F \ A ({u, v} eE ^ (0; e„„)(0; euvV • Z 
Vm e A, w e \ A ({u, (a„; e«)(a„; e„)^ • Z 

Z 
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In |105l 1104] ■ Krislock and Wolkowicz also exploit the SDP formulations of [5] together with ver- 
tex clustering techniques in order to improve the scaling abilities of SDP solution methods (also see 
Sect. I3.2.4p . Their facial reduction algorithm identifies cliques in the input graph G and itcratively ex- 
pands them using a iiT-trilateration order (see Sect. I3.3|) . Rather than "stitching together" pieces, as 
in [21], the theory of facial reduction methods works by considering the SDP relaxation of the whole 
problem and showing how it can be simplified in presence of one or more cliques (be they intersecting or 
disjoint). The computational results of |105) show that the facial reduction algorithm scales extremely 
well (graphs up to 100,000 vertices were embedded in M'^). A comparison with the BP algorithm (see 
Sect. 13. appears in jl22[ Table 6]. The BP algorithm is less accurate (the most common LDE values 
are 0(10"^^) for BP and 0(10^^^) for facial reduction) but faster (BP scores between 1% and 10% of 
the time taken by facial reduction). 



4.1.3 Second-order cone programming 

A second-order cone programming (SOCP) relaxation of the WSNL was discussed in [215| . The NLP 
formulation ([7]) is first reformulated as follows: 

min J2 ^uv 

{u,v}£E 

V{w, v} e E Xu- Xy ^ Wuv I 

y{u,v}€E Vuv-Zuv = C f 

y{u,v}eE WWuvW^ = Vuv 

u > 0. , 

Next, the constraint — Vuv is relaxed to ||mui,|p < The SOCP relaxation is weaker than 

the SDP one ( (PTjl - ljM)) ). but scales much better (4000 vs. 500 vertices). It was abandoned by Tseng in 
favour of the ESDP |173j . which is stronger than the SOCP relaxation but scales similarly. 



4.1.4 Unit disk graphs 

Unit disk graphs are intersection graphs of equal circles in the plane, i.e. vertices are the circle centers, 
and there is an edge between two vertices u, v if their Euclidean distance is at most twice the radius. 
Unit disk graphs provide a good model for broadcast networks, with each center representing a mobile 
transmitter/receiver, and the radius representing the range. In [41j . it is shown that several standard 
NP-complete graph problems are just as difficult on unit disk graphs as on general graphs, but that the 
maximum clique problem is polynomial on unit disk graphs (the problem is reduced to finding a maximum 
independent set in a bipartite graph) . In [33] , it is shown that even recognizing whether a graph is a unit 
disk graph is NP-hard. A slightly different version of the problem, consisting in determining whether 
a given weighted graph can be realized in as a unit disk graph of given radius, is also NP-hard [S]. 
From the point of view of DC, it is interesting to remark that the DGP, restricted to sufficiently dense 
unit disk graphs and provided a partial realization is known for a subset of at least K + 1 vertices, can 
be solved in polynomial time j201] . If the graph is sparse, however, the DGP is still NP-hard [TU] . 

The study of unit disk graphs also arises when packing equal spheres in a subset of Euclidean space 
[i^ : the contact graph of the sphere configuration are unit disk graphs. 



4.2 Statics 



Statics is the study of forces acting on physical systems in static equilibrium. This means that the 
barycenter of the system undergoes no linear acceleration (we actually assume the barycenter to have 
zero velocity), and that the system docs not rotate. Geometrically, with respect to a frame of reference, 
the system undergoes no translations and no rotations. The physical systems we are concerned with are 
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bar-and-joiiit structures, i.e. three-dimensional embodiments of graph frameworks (G, x) where G is a 
simple weighted undirected graph and a; is a valid realization thereof: joints are vertices, bars are edges, 
and bar lengths are edge weights. The placement of the structure in physical space provides a valid 
realization of the underlying graph. Because we suppose the structures to be stiff, they cannot undergo 
reflections, either. In short, the equivalence class of a rigid graph frameworks modulo congruences is a 
good representation of a structure in static equilibrium. Naturally, the supporting bar-and-joint structures 
of man-made constructions such as houses, buildings, skyscrapers, bridges and so on must always be in 
static equilibrium, for otherwise the construction would collapse. 

Statics was a field of study ever since humans wanted to have rooves over their heads. The main 
question is the estimation of reaction forces that man-made structures have to provide in order to remain 
in static equilibrium under the action of external forces. In 1725, Varignon published a textbook |217| 
which implemented ideas he had sketched in 1687 about the application of systems of forces to different 
points of static structures. By the mid-1800s, there was both an algebraic and a graphical method for 
testing rigidity of structures. Because of the absence of computing machinery, the latter (called graphical 
statics) was preferred to the former |47l 11861 194] . Cremona proposed a graphical axiomatization of 
arithmetic operations in |48| , whose purpose was probably that of giving an implied equivalence between 
two methods. The algebraic method attracted some attention notwithstanding its numerical difficulties: 
Maxwell proposed a simplified version |150j in 1864. 

4.2.1 Infinitesimal rigidity 

Since statics is mainly concerned with the physical three-dimensional world, we fix X = 3 for the rest of 
this section. Consider a function : V — >■ R'^ that assigns a force vector Fy € M'^ to each point Xy g M.^ 
of a framework (G, a;). If the framework is to be stationary, the total force and torque acting on it must 
be null to prevent translations (assuming a zero initial velocity of the barycenter) and rotations. This 
can be written algebraically [1811 1208j as: 

= (26) 

yi<j<K '^{Fy.Xyj - FyjXy^) ^ 0. (27) 

A force F satisfying Eq. (j26|) - (|27p is called an equilibrium force (or equilibrium load). Applied to bar- 
and-joint structures, equilibrium forces tend to compress or extend the bars without moving the joints in 
space. Since bars are assumed to be stiff (or equivalently, the graph edge weights are given constants), 
the corresponding reaction forces at the endpoint of each bar should be equal in magnitude and opposite 
in sign. We can define these reaction forces by means of an edge weighting a; : — 5- M representing 
the amount of force in each bar per unit length (uj is negative for bar tensions and positive for bar 
compressions). Stiffness of the structure translates algebraically to a balance of equilibrium force and 
reaction: 

yueV Fu+ ^uv{xu~Xy) = 0. (28) 

veN(u) 

A vector a; G R"' satisfying Eq. (j28p is called a resolution, or resolving stress, of the equilibrium force F 
[181j . If F = 0, then w is a self-stress. 

For the following, we introduce (squared) edge functions and displacements. The edge function of a 
framework {G,x) is a function : R"-^ M™ given by (j){x) = {\\xu — Xv\\ \ {u,v} G E). We denote 
the squared edge function (||a;„ — | {u,v} G E) by cj)^. The edge displacement of a framework 

(G, x), with respect to a displacement y, is a continuous function ^ : [0,1] — t- R™ given by fj,{t) = 
{\\yu{t) — yv{t)\\ I {u,v} G E). We denote the squared edge displacement {\\yu{t) — yi;(i)|p | {u,v} G E) 
by 
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Eq. (j28[) can also be written as 

where dcj)^ is the matrix whose {u, u}-th row encodes the derivatives of the {u, uj-th component of the 
squared edge function (jP{x) with respect to each component Xyi of x. Observe that the {u, v}-th row of 
this matrix only has the six nonzero components 2(xui — Xyi) and 2{xyi — Xui) for i £ {1, 2, 3} (see |181[ 
p. 13]). If we now consider Eq. (^9]) applied to a displacement y{t) of x, differentiate it with respect to 
t and evaluate it at i = 0, we obtain the linear system loA = where A = ■^dcjp, i.e. the homogeneous 
version of Eq. [29l 

Consider now a squared edge displacement with respect to a flexing y of the framework {G,x). 

By definition of flexing, we have ^^(t) = (c?^„ | {u,v} £ E) for all t S [0, 1]. Differentiating with respect 

to we obtain the scalar product relation 2(y„(<) — yv{t)) ■ (^^[f^ ^Ht^) ~ ^ (because the edge weights 

duv are constant with respect to t) for all {u, u} G E. Evaluating the derivative at t = yields 

V{u,w}e£' [xu- Xv) ■ [au- tty) (30) 

where a : y ^ R'^ is a map that assigns initial velocities = ^^lo each v £ V. We remark that 
the system (j30p can be written as Aa — [751 Thm. 3.9]. We therefore have the dual relationship 
LuA = = Aa between a and lo. 

By definition, (G, x) is infinitesimally rigid if a only encodes rotations and translations. The above 
discussion should give an intuition as to why this is equivalent to stating that every equilibrium force 
has a resolution (see [751 1181[ 1208) for a full description). Indeed, infinitesimal rigidity was defined in 
this dual way by Whiteley |223) (who called it static rigidity). The matrix A above is called the rigidity 
matrix of the framework {G,x). Notice that, when a valid realization x is known for G, then even those 
distances for {u, v} ^ E can be computed for G: when the rows of A are indexed by all unordered pairs 
{u, w} we call A the complete rigidity matrix of (G, x). 

Infinitesimal rigidity is a stricter notion than rigidity: all infinitesimally rigid frameworks are also 
rigid [781 Thm. 4.1]. Counterexamples to the converse of this statements, i.e. rigid frameworks which are 
infinitesimally flexible, usually turn out to have some kind of degeneracy: a flat triangle, for example, 
is rigid but infinitesimally flexible [1811 Ex. 4.2]. In general, infinitesimally rigid frameworks in K.^^ (for 
some integer K > Q) might fail to be infinitesimally rigid in higher-dimensional spaces [191| . 

4.2.2 Graph rigidity 

An important practical question to be asked about rigidity is whether certain graphs give rise to in- 
finitesimally rigid frameworks just because of their graph topology, independently of their edge weights. 
Bar-and-joint frameworks derived from such graphs are extremely useful in architecture and construc- 
tion engineering. An important concept in answering this question is that of genericity: a realization is 
generic if all its vertex coordinates are algebraically independent over Q. Because the algebraic numbers 
have Lebesgue measure zero in the real numbers, this means that the set of non-generic realizations have 
Lebesgue measure in the set of all realizations. 

Rigidity and infinitesimal rigidity are defined as properties of frameworks, rather than of graphs. 
It turns out, however, that if a graph possesses a single generic rigid framework, then all its generic 
frameworks are rigid [71 Cor. 2]. This also holds for infinitesimal rigidity Moreover, rigidity and 
infinitesimal rigidity are the same notion over the set of all generic frameworks [51 Sect. 3]. By genericity, 
this implies that in almost all cases it makes sense to speak of a "rigid graph" (rather than a rigid 
framework). The Graph Rigidity Problem asks, given a simple undirected graph G, whether it is 
generically rigid. Notice that the input, in this case, does not involve edge weights. For example, any 
graph is almost always flexible for large enough values of K unless it is a clique [71 Cor. 4]. 



DISTANCE GEOMETRY PROBLEMS 



49 



We remark as an aside that, although genericity is required for laying the theoretical foundations 
of graph rigidity (see the proof of [78l Thm. 6.1]), in practice it is too strong. For an edge weighting 
to be algebraically independent over Q, at most one edge weight can be rational (or even algebraic). 
Since computers are usually programmed to only represent rational (or at best algebraic) numbers, no 
generic realization can be treated exactly in any practical algorithmic implementation. The conceptual 
requirement that genericity is really meant to convey is that an infinitesimally rigid generic realization 
will stay rigid even though the edge weighting is perturbed slightly [191j . The definition given in [84] is 
more explicit in this sense: a realization is generic if all the nontrivial minors of the complete rigidity 
matrix have nonzero value. Specifically, notice that the polynomials induced by each minor are algebraic 
relations between the values of the components of each vector in the realization. Naturally, asking for full 
algebraic independence with respect to any polynomial in Q guarantees Graver's definition, but in fact, 
as Graver points out [85] . it is sufficient to enforce algebraic independence with respect to the system of 
polynomials induced by the nontrivial minors of the rigidity matrix (also see Sect. [3373)1 . 

Generic graph rigidity can also be described using the graphical matroid M{G) on G: a set of edges 
is independent if it does not contain simple cycles. The closure of an edge subset F <Z E contains F and 
all edges which form simple cycles with edges of F. We call the edge set F rigid if its closure is the clique 
on the vertices incident on F. A graphical matroid M[G) is an abstract rigidity matroid if it satisfies 
two requirements: (i) if two edge sets are incident to fewer than K common vertices, the closure of their 
union should be the union of their closures; and (ii) if two edge sets are incident to at least K common 
vertices, their union should be a rigid edge set jl91j . Condition (i) loosely says that if the two edge 
sets are not "connected enough" , then their union should give rise to flexible frameworks in R^, as the 
common vertices can be used as a "hinge" in around which the two edge sets can rotate. Condition 
(ii) says that when no such hinges can be found, the union of the two edge sets gives rise to rigid graphs. 
If the only resolution to the zero equilibrium force is the zero vector, then the complete rigidity matrix 
has maximum rank (i.e. it has the maximum possible rank over all embeddings in R"^), and its rows 
naturally induce a matroid on the complete set of edges {{u, u} | u ^ w G T^}, called the rigidity matroid 
of the framework (G,x). It was shown in [84j that if x is generic, then the rigidity matroid is abstract. 

4.2.3 Some classes of rigid graphs 

Euler conjectured in 1766 that all graphs given by the edge incidence of any triangulated polyhedral 
surface are rigid in R'^. This conjecture was proven true for special cases but eventually disproved in 
general. Cauchy proved in 1813 that the conjecture holds for strictly convex polyhedra [35], Alexandrov 
proved in 1950 that it holds for convex polyhedra [T], and Gluck proved in 1975 that it also almost always 
holds for any triangulation of a topological sphere [75]. The general conjecture was finally disproved by 
Connelly in 1977 [33] using a skew octahedron. 

This does not mean to say that there are no purely topological characterizations of rigid graphs. In 
1911, Henneberg described two local procedures (or "steps") to construct new, larger rigid graphs from 
given rigid graphs [M] (if a given graph can be "deconstructed" by using the same procedures backwards, 
then the graph is rigid). The Henneberg type I step is as follows: start with a i^-clique and add new 
vertices adjacent to at least K existing vertices. This defines a vertex order known as Henneberg type I 
order (see Sect. 11.1.^ . The Henneberg type II step is somewhat more involved, and we refer the interested 
reader to the extensive account of Henneberg and Henneberg-like procedures which can be found in [2 08) . 
Here follows a philological note on Henneberg type I orders: although they are always referred to |94j . 
they were actually first defined in a previous book by Henneberg [931 p. 267]. But in fact, a picture with 
a Henneberg type I order in R^ appeared one year earlier, in 1885, in |185[ Fig. 30, PI. XV]. 

Limited to R^, a characterization of all rigid graphs G in R^ was described by Laman in 1970 [110| : 
l^;] = 2]F| — 3 and for every subgraph {V' , E') of G, \E'\ < 2\V'\—3. Equivalent but more easily verifiable 
conditions were proposed in [1471 11781 1207[ . Unluckily, such conditions do not hold for R''. For K > 2, 
no such complete characterization is known as yet; an account of the current conjectures can be found in 
[2241 [55] , and a heuristic method was introduced in [200] . 
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4.3 Other applications 

DG is not limited to these applications, however. For example, an application to the synchronization of 
clocks from the measure of time offsets between pairs of clocks is discussed in |195j . This, incidentally, is 
the only engineering application of the DGPi we are aware of. The solution method involves maximizing 
a quadratic form subject to normalization constraints; this is relaxed to the maximization of the same 
quadratic form over a sphere, which is solved by the normalized eigenvector corresponding to the largest 
eigenvalue. Another application is the localization and control of fleets of autonomous underwater vehicles 
(AUVs) [12 . This is essentially a time-dependent DGP, as the delays in sound measurements provide 
an estimate of AUV-to-AUV distance and an indication of how it varies in time. We remark that GPS 
cannot be used under water, so AUVs must resurface in order to determine their positions precisely. A 
third application to the quantitative analysis of music and rhythm is discussed in |56j . 

In the following section, we briefly discuss two other important engineering applications of DG: data 
visualization by means of multidimensional scaling, and robotics, specifically inverse kinematic calcula- 
tions. In the former, we aim to find a projection in the plane or the space which renders the graph 
visually as close as possible to the higher-dimensional picture (see Sect. I4.3TT|) . In the latter, the main 
issue is to study how a robotic arm (or system of robotic arms) moves in space in order to perform certain 
tasks. Known distances include those from a joint to its neighbouring joints. The main problem is that 
of assigning coordinate values to the position vector of the farthest joint (see Sect. 14. 3T^ . 

4.3.1 Data visualization 

Multidimensional Scaling (MDS) [SU [70] is a visualization tool in data analysis for representing mea- 
surements of dissimilarity among pairs of objects as distances between points in a low-dimensional space 
in such a way that the given dissimilarities are well-approximated by the distances in that space. The 
choice of dimension is arbitrary, but the most frequently used dimensions are 2 and 3. MDS methods 
differ mainly according to the distance model, but the most usual model is the Euclidean one (in order to 
represent correlation measurements, a spherical model can also be used). Other distances, such as the £i 
norm (also called Manhattan distance) are used [6l I216| . The output of MDS provides graphical displays 
that allow decision makers to discover hidden structures in complex data sets. 

MDS techniques have been used primarily in psychology. According to |103j . the first important 
contributions to the theory of MDS are probably |203[ 1204] . but they did not lead to practical methods. 
The contributions to the MDS methods are due to Thurstonian approach, summarized in chapter 11 
of [212| , although the real computational breakthrough was due to Shepard |192l 11931 1194j . The next 
important step was given by Kruskal |106l I107j . who puts Shepard's ideas on a formal way in terms of 
optimization of a least squares function. Two important contributions after Shepard-Kruskal works are 
[36] and [206] . 

Measurements of dissimilarity among n objects can be represented by a dissimilarity matrix D = (dij). 
The goal of MDS is to construct a set of points Xi € (for i < n and K low, typically K g {2, 3}) 
corresponding to those n objects such that pairwise distances approximate pairwise object dissimilarities 
(also see the APA method in Sect. I3.4.4p . MDS is complementary to Principal Component Analysis 
(PGA) [50], in the following sense. Given a set X of n points in (with H "high"), PGA finds a K- 
dimensional subspace of M'^ (with K "small") on which to project X in such a way that the variance of 
the projection is maximum (essentially, PGA attempts to avoid projections where two distant points are 
projected very close). PGA might lose some distance information in the projection, but the remaining 
information is not distorted. MDS identifies a /C-dimensional subspace t of M.^ which minimizes the 
discrepancy between the original dissimilarity matrix D of the points in X and the dissimilarity matrix 
D' obtained by the projection on r of the points in X [HI]. In other words, MDS attempts to represents 
all distance information in the projection, even if this might mean that the information is distorted. 
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4.3.2 Robotics 

Kinematics is the branch of mechanics concerning the geometric analysis of motion. The kinematic 
analysis of rigid bodies connected by flexible joints has many similarities with the geometric analysis of 
molecules, when the force effects are ignored. 

The fundamental DG problem in robotics is known as the Inverse Kinematic Problem (IKP — see 
Item [15] in the list of Sect. II. 2|) . Geometric constructive methods can be applied to solve the IKP [75], 
but algebraic techniques are more suitable to handle more general instances. Reviews of these techniques 
in the context of robotics and molecular conformation can be found, for example, in jl691 1551 1179] . There 
are three main classes of methods in this category: those that use algebraic geometry, those based on 
continuation techniques, and those based on interval analysis. 

In general, the solution of the IKP leads to a system of polynomial equations. The methods based 
on algebraic geometry reduce the polynomial system to a univariate polynomial, where the roots of 
this polynomial yield all solutions of the original system |149l I35j . Continuation methods, originally 
developed in |182j . start with an initial system, whose solutions are known, and transform it into the 
system of interest, whose solutions are sought. In |213) . using continuation methods, it was shown that 
the inverse kinematics of the general 6R manipulator (an arm system with six rotatable bonds with fixed 
lengths and angles [HS]) has 16 solutions; more information can be found in |220j . 

A type of interval method applied to IKP is related to the interval version of the Newton method 
[176j . and others are based on the iterative division of the distance space of the problem |137| . An 
interesting method in the latter class |209j essentially consists in solving a EDMCP whose entries are 
intervals (see Sect. I2.6] and l2.6?2|) . When the distance matrix is complete, the realization of the selected 
points can be carried out in polynomial time (see e.g. [1991 155] ). In order to determine the values for the 
unknown distances, in jl75j . a range is initially assigned to the unknowns and their bounds are reduced 
using a branch-and-prune technique, which iteratively eliminates from the distance space entire regions 
which cannot contain any solution. This elimination is accomplished by applying conditions derived from 
the theory of distance geometry. This branch-and-prune technique is different from the BP algorithm 
discussed in Sect. 13.3] and 13.51 as the search space is continuous in the former and discrete in the latter. 
Another branch-and-prune scheme for searching continuous space is described in |234) . This is applied 
to molecular conformational calculations related to computer-assisted drug design. 



5 Conclusion 

Euclidean distance geometry is an extensive field with major biological, statistical and engineering appli- 
cations. The foundation of its theory was laid around a century ago by mathematicians such as Cayley, 
Menger, Schoenberg, Blumenthal and Godel. Recent extensions, targeting the inverse problem of de- 
termining a distance space given a partial distance function, contribute further mathematical as well as 
applied interest to the field. Because of the breadth and maturity of this field, our survey makes no 
claim to completeness; furthermore, we admit to a personal bias towards applications to molecular con- 
formation. We strove, however, to give the reader a sufficiently informative account of the most useful, 
interesting, and beautiful results of Euclidean distance geometry. 
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